Remote Principal Software Engineer - AI Infrastructure
About the Role
We are seeking a Remote Principal Software Engineer to join our innovative team at NVIDIA Dynamo. This role focuses on building scalable AI infrastructure for large language models and reasoning systems. You will be part of a dynamic team dedicated to addressing the most challenging issues in distributed AI infrastructure, ensuring high-performance AI inference for demanding applications.
What You'll Do
- Collaborate on the design and development of the Dynamo Kubernetes stack, enhancing the Remote Principal Software Engineer capabilities.
- Introduce new features to the Dynamo Python SDK and Rust Runtime Core Library.
- Design, implement, and optimize distributed inference components in Rust and Python.
- Contribute to the development of disaggregated serving for various inference engines.
- Improve intelligent routing and KV-cache management subsystems.
- Engage with the open-source community, participate in code reviews, and assist with issue triage on GitHub.
- Write clear documentation and contribute to user and developer guides.
Requirements
- BS/MS or higher in computer engineering, computer science, or related field.
- 15+ years of proven experience in software engineering, particularly in systems programming.
- Strong proficiency in Rust and/or C++, with experience in Python for workflow and API development.
- Experience with Go for Kubernetes controllers and operators development.
- Deep understanding of distributed systems, parallel computing, and GPU architectures.
- Experience with cloud-native deployment and container orchestration (Kubernetes, Docker).
- Familiarity with open-source development workflows (GitHub, CI/CD).
- Excellent problem-solving and communication skills.
Nice to Have
- Prior contributions to open-source AI inference frameworks.
- Experience with GPU resource scheduling, cache management, or high-performance networking.
- Understanding of LLM-specific inference challenges.
What We Offer
- Highly competitive salary ranging from $272,000 to $431,250 based on experience and location.
- Equity options available.
- Comprehensive benefits package including health, wellness, and retirement plans.
- Remote work flexibility with a focus on work-life balance.
- Opportunities for professional development and growth within a leading tech company.
This Remote Principal Software Engineer position at NVIDIA offers a chance to lead innovative AI infrastructure projects with a competitive salary and equity options.
Who Will Succeed Here
Deep expertise in Rust, C++, and Python for developing high-performance AI infrastructure, with a focus on efficient memory management and concurrency in distributed systems.
Strong self-motivation and proactive problem-solving skills, essential for thriving in a fully remote environment while collaborating with cross-functional teams on complex projects.
Extensive experience in Kubernetes and Docker for container orchestration and deployment of AI applications, along with a solid understanding of GPU architecture to optimize performance.
Learning Resources
Career Path
Market Overview
Skills & Requirements
Domain Trends
Industry News
Loading latest industry news...
Finding relevant articles from the last 6 months