Customer.io01.03.26
AI SCORE 8.5

Site Reliability Engineering Manager - Remote

$175K–$195K/year

About the Role

Customer.io is seeking a dedicated and experienced Site Reliability Engineering Manager to join our team in a fully remote capacity. This role is essential for ensuring the reliability, availability, and performance of our services, and you will be working closely with cross-functional teams to enhance our infrastructure and operational processes. As a Site Reliability Engineering Manager, you will lead a talented team and play a pivotal role in shaping our engineering practices.

What You'll Do

  • Lead and mentor a team of Site Reliability Engineers, fostering a culture of collaboration and continuous improvement.
  • Develop and implement strategies to enhance system reliability and performance across our platforms.
  • Oversee incident management processes and ensure timely resolution of issues.
  • Collaborate with development teams to design and implement scalable and resilient systems.
  • Utilize monitoring and observability tools to proactively identify and address potential issues.
  • Drive initiatives to improve operational efficiency and reduce downtime.
  • Manage on-call rotations and ensure adequate coverage for critical systems.
  • Stay current with industry trends and best practices in site reliability engineering.

Requirements

  • 5+ years of experience in site reliability engineering or a related field.
  • Proven track record of managing and leading engineering teams.
  • Strong understanding of cloud infrastructure (AWS, GCP, or Azure).
  • Experience with container orchestration tools (Kubernetes, Docker).
  • Proficiency in scripting and automation (Python, Bash, etc.).
  • Excellent problem-solving skills and the ability to work under pressure.
  • Strong communication skills, with the ability to convey complex technical concepts to non-technical stakeholders.
  • Experience with incident response and management processes.

Nice to Have

  • Familiarity with CI/CD pipelines and DevOps practices.
  • Experience in a B2B SaaS environment.
  • Knowledge of security best practices in cloud environments.

What We Offer

  • Competitive salary range of $175,000 - $195,000.
  • Fully remote work environment with flexible hours.
  • Comprehensive health benefits and wellness programs.
  • Generous paid time off and holiday schedule.
  • Opportunities for professional development and growth.
  • A supportive and inclusive company culture.
  • Access to the latest tools and technologies.
  • Collaborative team environment with a focus on innovation.
Why This Job8.5 of 10

This role offers a unique opportunity to lead a remote team in a growing B2B SaaS company. With a competitive salary and a focus on innovation, it's an attractive position for experienced professionals.

Salary Range
Required
0/1
Optional
0/1
Bonus
0/1

Who Will Succeed Here

Proficient in managing cloud services across AWS, GCP, and Azure, with hands-on experience in setting up and optimizing CI/CD pipelines using Docker and Kubernetes.

Strong leadership skills with a focus on remote team management, fostering a culture of accountability and continuous improvement within a fully distributed team environment.

Deep understanding of SRE principles and practices, demonstrating a proactive mindset for incident management and system reliability, while possessing a solid foundation in scripting with Python and Bash.

Learning Resources

AWS Well-Architected Frameworkguide

Career Path

Site Reliability Engineering Manager(Now)Director of Site Reliability Engineering(2-4 years)VP of Engineering or Chief Technology Officer(5-7 years)

Market Overview

Market Size 2024
$60B
Annual Growth
15.5%
AI Adoption
45%
Investment
+35%
Labour Demand
+25%
Avg Salary
$145K

Skills & Requirements

Required
AWSGCPAzure
Growing in Demand
TerraformServerless ArchitectureObservability Tools (e.g., Prometheus, Grafana)
Declining
Traditional Data CentersOn-Premise Virtualization (e.g., VMware)

Domain Trends

Increased Focus on Multi-Cloud Strategies
By 2025, 70% of organizations will adopt multi-cloud strategies to enhance flexibility and avoid vendor lock-in.
Rise of Automation in SRE
Automation tools are expected to reduce manual intervention in operations by 30% by 2024, leading to improved reliability and efficiency.
Growing Importance of Security in Cloud Operations
Security breaches in cloud environments have increased by 40% in the last year, driving demand for SREs with strong security skills.

Industry News

Loading latest industry news...

Finding relevant articles from the last 6 months

All job postings are automatically gathered by algorithms. We do not review or verify listings, be careful when applying and do not sign-in with iCloud or Google services.