Senior Software Engineer - Machine Learning Infrastructure
About the Role
We're hiring a Senior Software Engineer - Machine Learning Infrastructure to join our innovative team at AGI, Inc. This remote role focuses on building the infrastructure necessary for machine learning workflows, ensuring that our AI systems are reliable and efficient. If you're passionate about pushing the boundaries of AI and want to make a significant impact, this is the opportunity for you.
What You'll Do
- Design and implement robust CI/CD pipelines for machine learning workflows, automating training runs with a focus on reliability.
- Build scalable evaluation harnesses that benchmark models automatically, optimizing performance and resource usage.
- Develop internal SDKs, CLIs, and lightweight UIs that empower researchers to visualize model failures and manage datasets.
- Implement comprehensive tracking for model latency, throughput, and error rates to ensure system performance.
- Build dashboards and alerting systems for real-time visibility into system health and performance.
Requirements
- Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.
- 3+ years in Software Engineering, MLOps, or ML Infrastructure with a strong focus on Python.
- Experience building internal developer tools, CLIs, or dashboards.
- Familiarity with cloud infrastructure (AWS or GCP) and containerization technologies (Docker, Kubernetes).
Nice to Have
- Experience designing CI/CD pipelines specifically for ML workflows.
- Familiarity with LLM serving stacks such as vLLM or TGI.
- Experience managing GPU clusters and optimizing distributed workloads.
What We Offer
- Competitive company-sponsored medical, dental, and vision insurance.
- Top-tier relocation and immigration support.
- Opportunities for professional growth and development.
- A collaborative and innovative work environment.
- Flexible work hours to promote work-life balance.
Why This Role Matters
The Senior Software Engineer - Machine Learning Infrastructure role is crucial for scaling our research efforts. Great infrastructure multiplies the impact of every researcher, and your contributions will directly shape the speed and quality of our progress toward everyday AGI.
How To Apply
If you're interested in this remote position, please send us a link or a 60-second video showcasing something you've built and explaining why it matters, along with your resume or LinkedIn profile and two sentences about the hardest problem you've solved. We aim to respond to every exceptional candidate within 48 hours.
This Senior Software Engineer role offers a unique opportunity to work on cutting-edge machine learning infrastructure in a fully remote setting. With competitive benefits and a focus on innovation, this position stands out in the AI industry.
Generating success profile...
Analyzing job requirements and market data
Loading market overview...
Analyzing market trends and skill demands
Industry News
Loading latest industry news...
Finding relevant articles from the last 6 months