Site Reliability Engineer - Remote Opportunity
About the Role
Join our Global Services Engineering Team as a Site Reliability Engineer and take your career to the next level. This remote Site Reliability Engineer position offers you the chance to work on innovative solutions that empower our Global Services teams to deliver exceptional value to our customers. You will play a crucial role in ensuring system availability, reliability, scalability, and performance.
What You'll Do
- Design, build, and maintain highly reliable, scalable, and performant systems as a Site Reliability Engineer.
- Define, measure, and monitor key service reliability metrics and SLAs.
- Develop and improve monitoring, alerting, and incident response processes.
- Proactively identify performance bottlenecks and reliability risks, driving long-term solutions.
- Investigate, troubleshoot, and resolve complex production issues across distributed systems.
- Automate operational tasks to reduce manual effort and improve system efficiency.
- Participate in on-call rotations and lead incident management and post-incident reviews.
- Collaborate with engineering teams to influence system architecture and reliability best practices.
- Continuously improve deployment, release, and rollback processes to minimize risk and downtime.
- Enhance and maintain CI/CD pipelines and other tooling as required.
Requirements
- At least 3 years in an SRE role with a strong understanding of Linux/Unix systems and networking fundamentals.
- Experience with distributed systems and microservices architectures.
- Strong understanding of security and compliance considerations in production environments.
- Proficient in orchestration and containerization technologies such as Docker & Kubernetes.
- Good working knowledge of Java, Python, or GoLang, and follows common development practices and methodologies.
- Hands-on experience with databases, especially PostgreSQL and MySQL.
- Good hands-on experience with cloud platforms, such as AWS, Azure, or Google Cloud.
- Experience with monitoring and observability applications such as Prometheus, Grafana, and ELK.
Nice to Have
- Familiarity with incident management tools.
- Experience in a fast-paced environment.
- Knowledge of DevOps practices.
What We Offer
- Flexible working patterns to suit your lifestyle.
- Comprehensive health and wellness benefits.
- Opportunities for professional growth and development.
- A collaborative and innovative work environment.
- Support for your work-life balance.
This Site Reliability Engineer role at Akamai offers a unique opportunity to work remotely while enhancing system performance in a collaborative environment.
Who Will Succeed Here
Proficient in container orchestration with Kubernetes and Docker, enabling seamless deployment and scaling of microservices in a cloud environment.
Strong analytical mindset with a focus on monitoring and troubleshooting system performance, leveraging tools like Prometheus and Grafana for observability.
Hands-on experience with both relational databases (PostgreSQL, MySQL) and cloud services (AWS), ensuring robust data management and high availability.
Learning Resources
Career Path
Market Overview
Skills & Requirements
Domain Trends
Industry News
Loading latest industry news...
Finding relevant articles from the last 6 months