AI SCORE 8.5 / 10

Remote Senior Site Reliability Engineer - Infrastructure

$140K–$180K/year

Incident Response•SLO Monitoring•Cloud Infrastructure•AWS•GCP•Azure•Kubernetes•Docker•Python•Go

About the Role

We are looking for a Remote Senior Site Reliability Engineer to join our team at Underdog Sports. In this role, you will play a critical part in ensuring the reliability and scalability of our infrastructure as we continue to grow. As a founding member of the SRE team, you will help define our approach to operational excellence and reliability. This position offers a unique opportunity to make a significant impact from day one.

What You'll Do

Own and maintain the incident response process, defining procedures, tools, and best practices.
Guide teams in establishing and monitoring Service Level Objectives (SLOs), including setting up alerts and reporting systems.
Lead capacity planning initiatives, focusing on scalability and performance during peak traffic and game-day spikes.
Collaborate closely with platform, infrastructure, and product teams to enhance system reliability and developer experience.
Identify high-leverage reliability challenges and shape our incident response strategy.

Requirements

5+ years of experience in Site Reliability Engineering or a related field.
Strong understanding of incident response processes and best practices.
Experience with monitoring and alerting tools.
Proficiency in cloud infrastructure (AWS, GCP, or Azure).
Excellent problem-solving skills and a proactive approach to challenges.

Nice to Have

Familiarity with container orchestration (Kubernetes, Docker).
Experience in capacity planning and performance tuning.
Knowledge of programming/scripting languages (Python, Go, etc.).

What We Offer

Competitive salary and performance-based bonuses.
Flexible remote work environment.
Health, dental, and vision insurance.
Generous paid time off and holiday schedule.
Opportunities for professional development and growth.

Why This Job8.5 of 10

This Remote Senior Site Reliability Engineer role at Underdog Sports offers a unique opportunity to shape the company's reliability practices while enjoying competitive pay and flexible work arrangements.

Salary Range

Required

0/1

Optional

0/1

Bonus

0/1

Who Will Succeed Here

→

Proficient in managing cloud infrastructure across AWS, GCP, and Azure, with hands-on experience in deploying and maintaining scalable applications in Kubernetes and Docker environments.

→

Strong analytical mindset with a proven track record in incident response, demonstrating the ability to quickly diagnose and resolve complex system outages while implementing effective SLO monitoring strategies.

→

Self-motivated and comfortable working in a fully remote environment, exhibiting excellent time management skills to balance multiple priorities and deliver operational excellence without direct supervision.

Learning Resources

→Incident Response Guideguide

→Site Reliability Engineering Coursecourse

→Kubernetes Basicsarticle

Career Path

Remote Senior Site Reliability Engineer - Infrastructure(Now)→Lead Site Reliability Engineer(1-2 years)→Director of Site Reliability Engineering(3-5 years)

Market Overview

Market Size 2024

$8.5B

Annual Growth

12.3%

AI Adoption

45%

Investment

+200%

Labour Demand

+30%

Avg Salary

$135K

Skills & Requirements

Required

Incident ResponseSLO MonitoringCloud Infrastructure

Growing in Demand

Chaos EngineeringObservability Tools (e.g., Prometheus, Grafana)Automated Incident Management

Declining

Traditional ITIL Incident ManagementOn-Premise Infrastructure Management

Domain Trends

Increased Automation in Incident Response

By 2025, 60% of organizations will automate incident response processes to improve efficiency and reduce human error.

Shift to Cloud-Native Incident Management

80% of companies are adopting cloud-native technologies, leading to a 25% increase in demand for SREs skilled in cloud incident response.

Rise of AI-Driven Monitoring Solutions

AI-driven monitoring solutions are projected to grow by 35% in the next two years, enhancing real-time incident detection and response.

Industry News

Loading latest industry news...

Finding relevant articles from the last 6 months

All job postings are automatically gathered by algorithms. We do not review or verify listings, be careful when applying and do not sign-in with iCloud or Google services.