Senior Machine Learning Infrastructure Engineer - Remote
About the Role
We are seeking a Senior Machine Learning Infrastructure Engineer to join our innovative team at Audiience. This fully remote position allows you to work from anywhere while collaborating with a talented group of engineers dedicated to transforming the publishing industry through AI. As a Senior Machine Learning Infrastructure Engineer, you will play a crucial role in building and maintaining the infrastructure that supports our machine learning initiatives.
What You'll Do
- Architect and own the end-to-end ML training infrastructure, ensuring efficiency from data ingestion to model checkpointing.
- Build scalable, reproducible training pipelines that empower the research team to iterate quickly and effectively.
- Manage compute orchestration, distributed training setups, and GPU cluster management to optimize performance.
- Implement and oversee experiment tracking using tools like W&B and MLflow, ensuring data integrity and version control.
- Collaborate closely with research teams to translate experimental approaches into production-ready pipelines.
- Identify and systematically eliminate bottlenecks in training speed, cost, and reliability, enhancing overall system performance.
Requirements
- Deep experience with distributed training frameworks such as FSDP, DeepSpeed, or Megatron.
- Strong proficiency in PyTorch and modern machine learning tooling.
- Experience with cloud compute orchestration (AWS, GCP, or Azure) at scale.
- Familiarity with containerization technologies like Docker and Kubernetes for ML workloads.
- Solid understanding of ML experiment tracking, data versioning, and reproducibility practices.
- Ability to profile and optimize training throughput and resource utilization effectively.
Nice to Have
- Experience with CUDA, fused kernels, or low-level performance optimization.
- Prior work with large language models (LLMs) or large-scale foundation model training.
- Experience building internal ML platforms or developer tooling for research teams.
- Open-source contributions or published engineering write-ups.
- Previous startup or early-stage engineering experience.
What We Offer
- Competitive salary and benefits package, including equity options.
- Generous time off to recharge and flexible working hours.
- Opportunity to work at the forefront of AI infrastructure, solving extraordinary problems.
- A collaborative environment where your contributions are valued and impactful.
- Continuous learning opportunities with a team that challenges and sharpens your skills.
If you're ready to take on the challenge of building something that has never existed in a market that has never had it, we want to hear from you. Apply now for the Senior Machine Learning Infrastructure Engineer position and join us in redefining what's possible in publishing AI.
Join a pioneering team at Audiience as a Senior Machine Learning Infrastructure Engineer. This remote role offers competitive compensation and the chance to innovate in AI infrastructure.
Generating success profile...
Analyzing job requirements and market data
Loading market overview...
Analyzing market trends and skill demands
Industry News
Loading latest industry news...
Finding relevant articles from the last 6 months