About the Role
We are looking for a Senior Site Reliability Engineer to join our team remotely. In this role, you will design, operate, and scale data infrastructure and distributed systems. You will manage critical systems that support large-scale data workloads, optimize performance, and drive automation across operations. This position offers a unique opportunity to shape operational practices and improve service stability for users worldwide.
What You'll Do
- Operate and maintain large-scale data systems, ensuring stability, scalability, and performance.
- Design and implement deployment processes, leveraging virtualization and containerization technologies.
- Monitor system health, analyze failures, and proactively identify sources of instability in complex distributed systems.
- Automate operational tasks and streamline workflows to improve efficiency.
- Collaborate with engineering and data teams to support their projects and remove roadblocks.
- Mentor peers in technical and operational best practices.
- Participate in global team collaboration asynchronously and attend team gatherings or conferences as needed.
Requirements
- 5+ years of experience in SRE, DevOps, operations, or software engineering roles managing production systems at scale.
- Proficiency with scripting and programming languages commonly used in SRE contexts (Python, Go, Ruby, etc.).
- Experience with configuration management and orchestration tools such as Puppet, Ansible, or Terraform.
- Strong understanding of distributed systems and data platforms.
- Ability to work independently and effectively as part of a globally distributed, remote-first team.
- Excellent English communication skills, both written and verbal.
- Customer-oriented mindset, focused on supporting users and communities.
- Bonus: Experience with Kubernetes, Ceph, and operating large-scale data platforms.
What We Offer
- Competitive U.S.-based salary range: $113,082 – $175,725 per year, adjusted for skills, experience, and location.
- Fully remote work with flexibility and autonomy.
- Paid time off and sabbatical opportunities.
- Health coverage and reimbursement options.
- Professional development and home office stipends.
- Opportunities for global collaboration and travel to team events.
- A chance to contribute to large-scale, high-impact data infrastructure and distributed systems.
This Senior Site Reliability Engineer position offers a competitive salary and the flexibility of remote work. Ideal for those passionate about improving data infrastructure.
About Jobgether
Explore Jobgether careers in 2026 and discover a wide range of job openings, including remote, hybrid, and office roles. Our platform offers advanced filters, application tracking, and valuable company insights to enhance your job search experience. Uncover exciting career opportunities at Jobgether and take the next step towards your dream role today. Join us and shape your future in 2026.
Who Will Succeed Here
Proficiency in Python and Go for scripting and automation tasks, with a strong understanding of performance tuning within Kubernetes clusters.
Demonstrates a proactive mindset in optimizing infrastructure using Terraform and Ansible, particularly in a remote work environment that requires self-motivation and discipline.
Experience in managing and troubleshooting distributed systems, with a focus on continuous integration and deployment using tools like Puppet and Ceph to ensure high availability and reliability.
Learning Resources
Career Path
Market Overview
Skills & Requirements
Domain Trends
Industry News
Loading latest industry news...
Finding relevant articles from the last 6 months