Remote Forward Deployed Engineer - AI Inference
About the Role
We are seeking a Remote Forward Deployed Engineer - AI Inference to join the vLLM and LLM-D Engineering team at Red Hat. In this role, you will not just build software; you will be the bridge between our cutting-edge inference platform (LLM-D and vLLM) and our customers' most critical production environments. As a Remote Forward Deployed Engineer, you will interface directly with engineering teams at our customers to deploy, optimize, and scale distributed Large Language Model (LLM) inference systems. You will solve "last mile" infrastructure challenges that defy off-the-shelf solutions, ensuring that massive models run with low latency and high throughput on complex Kubernetes clusters.
What You'll Do
- Orchestrate Distributed Inference: Deploy and configure LLM-D and vLLM on Kubernetes clusters, setting up advanced deployments like disaggregated serving and KV-cache aware routing.
- Optimize for Production: Go beyond standard deployments by running performance benchmarks, tuning vLLM parameters, and configuring intelligent inference routing policies to meet SLOs for latency and throughput.
- Code Side-by-Side: Collaborate with customer engineers to write production-quality code (Python/Go/YAML) that integrates our inference engine into their existing Kubernetes ecosystem.
- Solve the "Unsolvable": Debug complex interactions between model architectures, hardware accelerators, and Kubernetes networking.
- Feedback Loop: Act as the "Customer Zero" for our core engineering teams, channeling field learnings back to product development.
Requirements
- 8+ years of engineering experience in Backend Systems, SRE, or Infrastructure Engineering.
- Deep Kubernetes expertise, fluent in K8s primitives and experienced with stateful workloads and high-performance networking.
- Proficiency in Python and Go for systems programming.
- Experience with Infrastructure as Code tools like Helm and Terraform.
- Understanding of AI inference, including KV Caching and continuous batching in vLLM.
Nice to Have
- Experience contributing to open-source AI infrastructure projects.
- Knowledge of Envoy Proxy or Inference Gateway (IGW).
- Familiarity with model optimization techniques like Quantization.
What We Offer
- Comprehensive medical, dental, and vision coverage.
- 401(k) with employer match.
- Paid time off and holidays.
- Flexible Spending Account for healthcare and dependent care.
- Paid parental leave plans for all new parents.
This Remote Forward Deployed Engineer position at Red Hat offers a unique opportunity to work on cutting-edge AI technologies in a collaborative environment. With a competitive salary and comprehensive benefits, it's an attractive role for experienced engineers.
Who Will Succeed Here
Proficient in Python and Go, with hands-on experience in developing and deploying AI inference models using Kubernetes and Terraform, enabling seamless integration into various production environments.
Strong problem-solving mindset with a focus on optimizing AI inference performance; adept at troubleshooting complex systems remotely, ensuring high availability and reliability for customer-facing applications.
Demonstrated experience in managing cloud infrastructure with Helm and Terraform in a remote work setting, showcasing self-motivation and the ability to collaborate effectively across distributed teams.
Learning Resources
Career Path
Market Overview
Skills & Requirements
Domain Trends
Industry News
Loading latest industry news...
Finding relevant articles from the last 6 months