About the Role
Wikimedia Foundation is seeking a Senior Site Reliability Engineer to join our dynamic team. As a Senior Site Reliability Engineer, you will play a pivotal role in supporting and enhancing the infrastructure that powers Wikipedia, one of the most visited websites globally. This Senior Site Reliability Engineer remote position offers the opportunity to work with a diverse, globally distributed team dedicated to ensuring the reliability and performance of our platform.
What You'll Do
- Perform day-to-day operational and DevOps tasks on Wikimedia’s public-facing infrastructure, including deployment, maintenance, configuration, and troubleshooting.
- Implement and utilize configuration management and deployment tools such as Puppet and Kubernetes.
- Lead continuous improvement initiatives by automating the installation, configuration, and maintenance of services on our platform.
- Collaborate closely with product teams to design scalable functionality and ensure operational efficiency.
- Participate in a 24/7 on-call rotation, responding to incidents and ensuring prompt recovery from outages.
- Mentor peers in technical and operational areas, fostering a culture of knowledge sharing and continuous learning.
- Engage in asynchronous communication with a global, cross-functional team.
- Travel 1-2 times a year for in-person events and team meetings.
Requirements
- 6+ years of experience in an SRE, Operations, or DevOps role.
- Proficiency in shell scripting and languages such as Python, Go, Bash, or Ruby, with a strong emphasis on Python.
- Experience with configuration management tools, particularly Puppet and Ansible.
- Thorough understanding of TCP/IP, HTTP, TLS, and DNS protocols.
- Strong Linux system-level troubleshooting skills and experience with package management on Linux systems (Debian preferred).
- History of automating tasks and processes, identifying gaps, and implementing automation solutions.
- Excellent English language skills, both verbal and written, with the ability to work independently in a distributed team.
- Experience in incident response and conducting root cause analysis post-incident.
Nice to Have
- Familiarity with high-performance HTTP(S) caching-proxy software (e.g., HAProxy, Varnish, Nginx).
- Experience with Linux kernel tuning for high-traffic loads.
- Knowledge of monitoring, metrics, and logging infrastructure (Prometheus, Grafana).
- Contributions to Free and Open Source software projects.
- Experience with LAMP stack technologies (PHP/HHVM, memcached/Redis) and MediaWiki.
- Experience in defining and implementing cross-team SLOs.
What We Offer
- Competitive salary range of $113,082 to $175,725 based on skills and experience.
- Remote-first work culture with flexibility to work from anywhere.
- Opportunities for professional development and continuous learning.
- Inclusive and diverse workplace, committed to equity and inclusion.
- Support for work-life balance, including flexible hours and remote work benefits.
Join us at Wikimedia Foundation as a Senior Site Reliability Engineer remote and contribute to our mission of making knowledge freely accessible to all. Apply now to be part of a team that values innovation, collaboration, and community engagement.
This Senior Site Reliability Engineer role at Wikimedia offers a unique opportunity to work remotely while contributing to a globally recognized platform. With a competitive salary and a commitment to open-source technology, this position stands out in the tech industry.
Generating success profile...
Analyzing job requirements and market data
Loading market overview...
Analyzing market trends and skill demands
Industry News
Loading latest industry news...
Finding relevant articles from the last 6 months