Remote Senior Staff Software Engineer - Reliability Focus
About the Role
Join NMI as a Remote Senior Staff Software Engineer and play a pivotal role in shaping our Reliability Engineering function. This position is designed for experienced professionals who are passionate about enhancing the reliability, performance, and operational maturity of critical platform services. As a key member of our Reliability Engineering team, you will help transition our engineering organization from reactive incident response to a proactive, engineered reliability approach.
What You'll Do
- Design and build reliability-focused frameworks, tooling, and standards that enhance platform uptime, performance, and operational confidence.
- Drive initiatives that shift reliability from reactive response to proactive engineering, emphasizing prevention, early detection, and fast recovery.
- Partner with engineering teams to embed reliability into system design, development practices, and deployment workflows.
- Establish and evolve observability practices, including metrics, logging, alerting, and dashboards that provide clear operational insight.
- Identify systemic risks and failure patterns, leading efforts to address them through automation, architectural improvements, and process refinement.
- Contribute hands-on to production codebases, internal tools, and platform services with a focus on long-term maintainability.
- Influence technical direction across teams through design reviews, technical proposals, and clear written communication.
- Improve operational maturity through better incident practices, post-incident learning, and continuous improvement loops.
- Mentor engineers by modeling strong ownership, technical judgment, and disciplined delivery.
- Participate in on-call rotations, with a clear mandate to reduce operational load over time through engineering.
Requirements
- 8+ years of experience building and operating production-grade software systems in complex environments.
- Strong experience in reliability engineering, systems design, and operational excellence.
- Proficiency in programming languages such as Python, Java, or Go.
- Experience with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes).
- Familiarity with monitoring and observability tools (Prometheus, Grafana, ELK stack).
- Excellent problem-solving skills and a proactive mindset.
- Strong communication skills, both written and verbal.
- Ability to work independently and as part of a distributed team.
Nice to Have
- Experience with infrastructure as code (Terraform, CloudFormation).
- Knowledge of CI/CD practices and tools.
- Previous experience in a leadership or mentoring role.
What We Offer
- Competitive salary ranging from $140,000 to $180,000 per year.
- Fully remote work environment with flexible hours.
- Opportunities for professional development and continuous learning.
- Health, dental, and vision insurance.
- Generous paid time off and holiday schedule.
- Collaborative and inclusive company culture.
- Access to the latest tools and technologies.
- Support for work-life balance and well-being.
This Remote Senior Staff Software Engineer position at NMI offers a competitive salary and the opportunity to lead reliability engineering initiatives in a fully remote environment.
Who Will Succeed Here
Expertise in Reliability Engineering practices, with hands-on experience in implementing monitoring tools like Prometheus and Grafana to proactively identify and resolve system issues.
Proven ability to design and implement CI/CD pipelines using tools such as Jenkins or GitLab CI, ensuring seamless deployment processes that enhance system reliability.
Strong programming skills in Python and Go, with a mindset focused on writing clean, maintainable code that adheres to best practices in cloud-native applications.
Learning Resources
Career Path
Market Overview
Skills & Requirements
Domain Trends
Industry News
Loading latest industry news...
Finding relevant articles from the last 6 months