Remote Software Engineer - Agent Infrastructure
About the Role
We are seeking a Remote Software Engineer to join the Agent Infrastructure team at OpenAI. In this role, you will be responsible for building systems that enable the training and deployment of highly useful AI agents. You will collaborate closely with researchers to design and scale the environment in which agentic models are trained, providing a workspace for AI models to execute code, debug issues, and develop software just like human software engineers do. This Remote Software Engineer position offers the chance to work on some of the most challenging technical problems in the AI field, focusing on the infrastructure layer that supports the capabilities and utility of agentic products.
What You'll Do
- Push massive compute clusters to their limits, contributing to a novel container orchestration platform built in-house by our team.
- Develop and maintain FastAPI and gRPC APIs that serve as the interface for our agentic infrastructure used in both training and production.
- Use Terraform to stand up and evolve complex infrastructure for both research and production environments.
- Collaborate with research teams to optimize systems for novel AI training runs and experimental applications.
- Build new capabilities to train more complex agentic models and rapidly scale these capabilities to some of the largest compute clusters in the world.
Requirements
- Deep experience working on large-scale machine learning infrastructure.
- Ability to identify bottlenecks and engineer solutions to optimize system performance in training environments.
- Experience building new systems from 0-1 quickly and scaling them significantly.
- Strong knowledge of cloud platforms and infrastructure-as-code technologies like Terraform.
- Technical expertise in virtualization and containerization technologies (e.g., Kata, Firecracker, gVisor, Sysbox).
Nice to Have
- Experience with optimizing runtime performance in complex, globally-distributed systems.
- Passion for solving complex, ambiguous problems at the intersection of infrastructure scalability, virtualization efficiency, and agentic capabilities.
What We Offer
- A competitive salary range of $255,000 to $405,000 per year.
- Relocation assistance for new employees.
- A hybrid work model with flexibility in work hours.
- Opportunities to work on cutting-edge AI technologies.
- A collaborative and inclusive work environment that values diverse perspectives.
This role offers a unique opportunity to work at the forefront of AI technology, tackling complex challenges in a collaborative environment. OpenAI is committed to innovation and inclusivity.
Who Will Succeed Here
Proficient in Python and experienced with FastAPI and gRPC for building scalable APIs, enabling seamless interaction between agentic models and the infrastructure.
Self-motivated and disciplined to work effectively in a remote setting, demonstrating strong time management skills to balance tasks and collaborate asynchronously with researchers.
Hands-on experience with Terraform and Kubernetes for infrastructure as code and container orchestration, showcasing a mindset focused on automation and efficient resource management.
Learning Resources
Career Path
Market Overview
Skills & Requirements
Domain Trends
Industry News
Loading latest industry news...
Finding relevant articles from the last 6 months