About the Role
We're hiring a Senior Site Reliability Engineer to join our dynamic team at Haystack. In this exciting remote role, you will be the architect of reliability for a massive distributed systems landscape, ensuring seamless performance for millions of connected devices worldwide.
What You'll Do
- Design, deploy, and scale high-performance observability platforms and Prometheus monitoring systems to support millions of global devices.
- Architect and maintain massive Elasticsearch clusters and robust data pipelines leveraging Kafka for real-time streaming.
- Drive "Infrastructure as Code" (IaC) initiatives by automating complex cloud environments using Terraform and Ansible.
- Build custom internal tools and sophisticated automation scripts using Python, Go, or Ruby to eliminate toil and boost system performance.
- Optimize Linux systems (Debian/Ubuntu) and participate in a collaborative on-call rotation to maintain 24/7 service availability.
Requirements
- 5+ years of battle-tested experience in Site Reliability Engineering (SRE) or DevOps within enterprise-scale cloud environments.
- Mastery of the Observability stack, specifically Prometheus, Grafana, and the full ELK Stack (Elasticsearch, Logstash, Kibana).
- Expert-level Linux systems administration skills and deep knowledge of distributed systems architecture and Kafka messaging.
- Hands-on proficiency with automation and configuration tools, including Terraform, Ansible, and programming in Python or Golang.
- The ability to thrive in a fast-paced environment, tackling complex scaling challenges for high-traffic cloud services.
Nice to Have
- Experience with cloud platforms like AWS or Azure.
- Familiarity with container orchestration technologies such as Kubernetes.
- Knowledge of security best practices in cloud environments.
What We Offer
- Competitive day rate of £55 - £62 per hour (Inside IR35).
- Long-term stability with an initial 12-month contract and high potential for extension.
- 100% remote working flexibility while supporting a premier London-based technology hub.
- Opportunity to work on a truly global scale, impacting the experience of millions of daily active users.
- Access to a supportive team and resources to enhance your skills and career growth.
This Senior Site Reliability Engineer role offers a unique opportunity to work remotely while impacting millions of users globally. With a competitive salary and a chance to work with cutting-edge technologies, it's a great fit for experienced professionals.
About Haystack
Explore exciting career opportunities at Haystack in 2026. Browse a wide range of remote, hybrid, and office positions tailored to your skills. Utilize our advanced filters, track your applications, and gain valuable insights into our company culture. Whether you’re seeking your next challenge or a fresh start, find the perfect Haystack role that matches your career aspirations today.
Who Will Succeed Here
Deep expertise in monitoring and observability tools such as Prometheus and Grafana, with a proven track record of deploying scalable systems that handle millions of metrics in real-time.
Strong proficiency in Infrastructure as Code (IaC) tools like Terraform and configuration management with Ansible, demonstrating a mindset focused on automation and efficiency in remote environments.
Robust programming skills in languages such as Python, Go, and Ruby, coupled with experience in building resilient microservices architectures and a proactive approach to troubleshooting and optimizing performance.
Learning Resources
Career Path
Market Overview
Skills & Requirements
Domain Trends
Industry News
Loading latest industry news...
Finding relevant articles from the last 6 months