About the Role
Underdog is hiring a full-time Senior Site Reliability Engineer to join our dynamic team. This Senior Site Reliability Engineer remote position allows you to work from anywhere in the USA, providing you with the flexibility to balance your work and personal life. In this role, you will be responsible for maintaining and improving our infrastructure, ensuring high availability and reliability of our services.
What You'll Do
- Design and implement scalable and reliable infrastructure solutions as a Senior Site Reliability Engineer remote.
- Monitor system performance and troubleshoot issues to ensure optimal uptime.
- Collaborate with development teams to enhance system reliability and performance.
- Automate operational tasks to improve efficiency and reduce manual intervention.
- Participate in on-call rotations to provide support for production systems.
Requirements
- 5+ years of experience in Site Reliability Engineering or related fields.
- Strong knowledge of cloud platforms (AWS, GCP, Azure).
- Proficiency in scripting and automation tools (Python, Bash, Terraform).
- Experience with container orchestration (Kubernetes, Docker).
- Familiarity with monitoring tools (Prometheus, Grafana, ELK stack).
Nice to Have
- Experience with CI/CD pipelines and tools.
- Knowledge of network protocols and security best practices.
- Familiarity with database management and optimization.
What We Offer
- Competitive salary ranging from $160,000 to $240,000.
- Flexible remote work environment.
- Comprehensive health benefits and wellness programs.
- Opportunities for professional development and growth.
- Collaborative and inclusive company culture.
This Senior Site Reliability Engineer position offers a competitive salary and the flexibility to work remotely, making it an attractive opportunity for experienced professionals.
Who Will Succeed Here
Proficient in managing multi-cloud environments, specifically with hands-on experience in AWS, GCP, and Azure, enabling seamless deployment and monitoring of applications.
Strong proficiency in automation and orchestration tools like Terraform and Kubernetes, demonstrating a mindset focused on infrastructure as code and continuous delivery practices.
Demonstrates a proactive troubleshooting approach using Prometheus and Grafana for monitoring, with an emphasis on performance optimization and incident response in a remote work environment.
Learning Resources
Career Path
Market Overview
Skills & Requirements
Domain Trends
Industry News
Loading latest industry news...
Finding relevant articles from the last 6 months