Haystack•10.04.26

AI SCORE 8.5 / 10

Senior Site Reliability Engineer - Remote

$143K–$162K/year

Prometheus•Grafana•ELK Stack•Elasticsearch•Kafka•Terraform•Ansible•Python•Go•Ruby•Linux

About the Role

We're hiring a Senior Site Reliability Engineer to join our dynamic team at Haystack. In this exciting remote role, you will be the architect of reliability for a massive distributed systems landscape, ensuring seamless performance for millions of connected devices worldwide.

What You'll Do

Design, deploy, and scale high-performance observability platforms and Prometheus monitoring systems to support millions of global devices.
Architect and maintain massive Elasticsearch clusters and robust data pipelines leveraging Kafka for real-time streaming.
Drive "Infrastructure as Code" (IaC) initiatives by automating complex cloud environments using Terraform and Ansible.
Build custom internal tools and sophisticated automation scripts using Python, Go, or Ruby to eliminate toil and boost system performance.
Optimize Linux systems (Debian/Ubuntu) and participate in a collaborative on-call rotation to maintain 24/7 service availability.

Requirements

5+ years of battle-tested experience in Site Reliability Engineering (SRE) or DevOps within enterprise-scale cloud environments.
Mastery of the Observability stack, specifically Prometheus, Grafana, and the full ELK Stack (Elasticsearch, Logstash, Kibana).
Expert-level Linux systems administration skills and deep knowledge of distributed systems architecture and Kafka messaging.
Hands-on proficiency with automation and configuration tools, including Terraform, Ansible, and programming in Python or Golang.
The ability to thrive in a fast-paced environment, tackling complex scaling challenges for high-traffic cloud services.

Nice to Have

Experience with cloud platforms like AWS or Azure.
Familiarity with container orchestration technologies such as Kubernetes.
Knowledge of security best practices in cloud environments.

What We Offer

Competitive day rate of £55 - £62 per hour (Inside IR35).
Long-term stability with an initial 12-month contract and high potential for extension.
100% remote working flexibility while supporting a premier London-based technology hub.
Opportunity to work on a truly global scale, impacting the experience of millions of daily active users.
Access to a supportive team and resources to enhance your skills and career growth.

Why This Job8.5 of 10

This Senior Site Reliability Engineer role offers a unique opportunity to work remotely while impacting millions of users globally. With a competitive salary and a chance to work with cutting-edge technologies, it's a great fit for experienced professionals.

Salary Range

Required

0/1

Optional

0/1

Bonus

0/1

About Haystack

Explore exciting career opportunities at Haystack in 2026. Browse a wide range of remote, hybrid, and office positions tailored to your skills. Utilize our advanced filters, track your applications, and gain valuable insights into our company culture. Whether you’re seeking your next challenge or a fresh start, find the perfect Haystack role that matches your career aspirations today.

Industry

Tech

Location

Remote

Who Will Succeed Here

→

Deep expertise in monitoring and observability tools such as Prometheus and Grafana, with a proven track record of deploying scalable systems that handle millions of metrics in real-time.

→

Strong proficiency in Infrastructure as Code (IaC) tools like Terraform and configuration management with Ansible, demonstrating a mindset focused on automation and efficiency in remote environments.

→

Robust programming skills in languages such as Python, Go, and Ruby, coupled with experience in building resilient microservices architectures and a proactive approach to troubleshooting and optimizing performance.

Learning Resources

→Prometheus Monitoring Systemguide

→Getting Started with Grafanaguide

→Introduction to Terraformcourse

Career Path

Senior Site Reliability Engineer(Now)→Lead Site Reliability Engineer(1-2 years)→Site Reliability Architect(3-5 years)

Market Overview

Market Size 2024

$7.5B

Annual Growth

15.2%

AI Adoption

35%

Investment

+25%

Labour Demand

+40%

Avg Salary

$130K

Skills & Requirements

Required

PrometheusGrafanaELK Stack

Growing in Demand

KubernetesOpenTelemetryCloud-Native Development

Declining

NagiosZabbix

Domain Trends

Increased Adoption of Cloud-Native Technologies

By 2025, 80% of enterprises will have migrated to cloud-native architectures, driving demand for SREs familiar with tools like Prometheus and Kubernetes.

Shift Towards Observability over Monitoring

The observability market is projected to grow by 25% annually as organizations seek to understand system performance deeply, promoting skills in tools like ELK Stack and Grafana.

Rise of Automation in SRE Practices

Automation tools like Terraform and Ansible are becoming essential, with 70% of companies reporting increased efficiency through automated deployment and monitoring processes.

Industry News

Loading latest industry news...

Finding relevant articles from the last 6 months

All job postings are automatically gathered by algorithms. We do not review or verify listings, be careful when applying and do not sign-in with iCloud or Google services.