Remote Senior Site Reliability Engineer - Technical Leadership
About the Role
Join Accenture Federal Services as a Remote Senior Site Reliability Engineer and play a pivotal role in enhancing the reliability and performance of our technical infrastructure. In this position, you will leverage your expertise to ensure that our systems are scalable and efficient, driving positive change across various federal sectors.
What You'll Do
- Design, build, and maintain reliable, scalable, and high-performance infrastructure and services to support business needs.
- Implement and advocate for SRE best practices, including automation, CI/CD pipelines, monitoring, and incident management.
- Collaborate with cross-functional teams to develop systems that meet high availability, performance, and reliability standards.
- Drive incident management processes, including root cause analysis, mitigation strategies, and long-term preventive measures.
- Establish, monitor, and refine service level objectives (SLOs), service level agreements (SLAs), and key performance indicators (KPIs) to ensure systems adhere to reliability and performance targets.
- Automate repetitive tasks to improve operational efficiency and reduce manual intervention.
- Build and maintain robust monitoring, logging, and alerting systems to ensure visibility into system performance and reliability.
- Provide technical mentorship and guidance to team members, fostering a culture of knowledge sharing and continuous improvement.
- Act as a technical leader by driving solutions to complex challenges, ensuring alignment with organizational goals.
- Prepare and deliver performance and reliability reports to stakeholders, offering insights and recommendations for improvements.
Requirements
- Proven experience in site reliability engineering or a similar role, with a focus on application and infrastructure scalability, reliability, and performance.
- Strong knowledge of ITSM principles and incident management processes.
- Expertise in automation tools, scripting, and infrastructure-as-code (IaC) technologies.
- Proficiency with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk).
- Experience with cloud platforms (e.g., AWS, Azure, GCP) and container technologies (e.g., Docker, Kubernetes).
- Strong analytical and problem-solving skills, with the ability to troubleshoot complex systems.
- Excellent communication and collaboration abilities, with a focus on cross-team partnerships.
- A passion for continuous learning, innovation, and driving impactful solutions.
Nice to Have
- Experience with security best practices in cloud environments.
- Knowledge of networking concepts and protocols.
- Familiarity with agile methodologies and DevOps practices.
What We Offer
- Competitive salary range of $140,000 to $180,000 annually.
- Comprehensive health benefits and wellness programs.
- Opportunities for professional development and certifications.
- Flexible work environment with remote work options.
- Collaborative and inclusive company culture.
- Access to cutting-edge technology and tools.
- Support for work-life balance and personal growth.
This role offers a unique opportunity to lead site reliability engineering efforts within a prominent technology firm. With a strong focus on cloud services and infrastructure, it promises a rewarding career path.
Who Will Succeed Here
Expertise in container orchestration with Kubernetes and containerization using Docker, demonstrating a strong command over deploying and managing microservices architecture in a cloud environment.
Proficient in automation and CI/CD practices, specifically with tools like Jenkins or GitLab CI, ensuring seamless integration and delivery pipelines that enhance system reliability and deployment speed.
Demonstrated ability to implement effective monitoring and alerting solutions using Prometheus and Grafana, with a proactive mindset to identify and resolve potential performance bottlenecks in infrastructure.
Learning Resources
Career Path
Market Overview
Skills & Requirements
Domain Trends
Industry News
Loading latest industry news...
Finding relevant articles from the last 6 months