Remote Site Reliability Engineer - Join Point72's Tech Team
About the Role
We're hiring a Remote Site Reliability Engineer to join Point72's innovative Technology Team. As we reimagine the future of investing, our team is dedicated to enhancing our IT infrastructure, ensuring we remain at the forefront of a rapidly evolving technology landscape. In this role, you'll have the opportunity to work with a team of experts who are passionate about experimenting and discovering new ways to leverage open-source solutions while embracing enterprise agile methodologies.
What You'll Do
- Design and implement automated operational workflows to improve system reliability and reduce manual intervention.
- Build and maintain observability solutions using tools such as Datadog, delivering metrics, monitoring, alerting, and dashboards.
- Partner with development teams to enhance application reliability, deployment safety, and performance through SRE best practices.
- Develop and maintain CI/CD pipelines and deployment automation using Bitbucket, Jenkins, GitHub Actions, and related tooling.
- Engineer scalable solutions for production environments across Linux and Windows systems.
- Automate infrastructure and operational tasks using Python, PowerShell, Bash, or similar scripting languages.
- Support and enhance the reliability of database platforms such as SQL Server and MongoDB from an SRE perspective.
- Participate in incident response, drive root cause analysis, and implement long-term reliability improvements.
- Define and enforce SLOs, SLIs, and error budgets in partnership with application teams.
- Collaborate with Networking, Platform, and Security teams to ensure end-to-end system reliability.
- Enable self-service and standardized operational patterns for development teams.
Requirements
- Strong hands-on experience with Linux and Windows operating systems.
- Proven experience building automation and tooling using Python or similar languages.
- Deep understanding of observability and monitoring, preferably with Datadog.
- Experience with CI/CD pipelines and deployment automation (Bitbucket, GitHub Actions, Jenkins, etc.).
- Operational and performance knowledge of SQL Server and MongoDB.
- Familiarity with cloud platforms (AWS or similar) and hybrid architectures.
- Solid understanding of networking concepts such as DNS, load balancing, and TCP/IP.
- Experience working closely with application development teams in an SRE or DevOps role.
- Experience with Kubernetes, OpenShift, and containerized workloads.
- Knowledge of infrastructure-as-code tools (Terraform, CloudFormation, etc.).
What We Offer
- Competitive salary range of $120,000 - $150,000 per year.
- Fully remote work environment, allowing you to work from anywhere.
- Opportunities for professional development and continuous learning.
- Collaborative and innovative team culture.
- Comprehensive benefits package including health insurance and retirement plans.
This Remote Site Reliability Engineer position at Point72 offers a competitive salary, a fully remote work environment, and opportunities for professional development.
Who Will Succeed Here
Proficient in Linux and Windows server management, with hands-on experience in scripting and automation using Python to enhance system reliability and performance.
Strong familiarity with CI/CD pipelines using tools like Bitbucket and GitHub Actions, demonstrating an ability to implement and optimize deployment processes in a remote work environment.
Analytical mindset with experience in monitoring and troubleshooting infrastructure using DataDog, alongside proficiency in SQL Server and MongoDB for efficient data management.
Learning Resources
Career Path
Market Overview
Skills & Requirements
Domain Trends
Industry News
Loading latest industry news...
Finding relevant articles from the last 6 months