About the Role
We are seeking a DevOps Engineer for GPU-Accelerated Computing Platform to join our team at Drexel University. This fully remote position focuses on building and operating a cutting-edge shared computing platform dedicated to GPU-accelerated workloads, particularly for AI model training. You will work closely with the University Research Computing Facility (URCF) and contribute to a collaborative environment that fosters innovation and growth.
What You'll Do
- Develop and maintain automation for provisioning, configuring, and managing the cluster using tools such as Ansible, Warewulf, and Kubernetes.
- Contribute to the Kubernetes platform layer, including networking, storage integration, security policies, and workload orchestration.
- Assist in building out storage infrastructure, including iRODS and Globus for data transfer, ensuring seamless integration with the compute cluster.
- Troubleshoot issues across the stack, from bare-metal boot problems to container orchestration bugs.
- Write and maintain operational and user-facing documentation to support users and team members.
- Coordinate with Drexel’s IT teams on shared infrastructure concerns, including networking, DNS, and firewall rules.
- Contribute to web application development for a user-facing portal for project management, permissions, and usage tracking.
Requirements
- Minimum of a Bachelor's Degree in Computer Science, Engineering, or a related field, or equivalent experience.
- 1-3 years of experience in DevOps or related fields.
- Experience with infrastructure tooling such as Linux systems administration, configuration management, containers, or container orchestration.
- Comfortable working in a terminal with tools like Git, SSH, and a text editor.
- Proficiency in at least one scripting language (Python, Bash, etc.).
- Strong written communication skills and the ability to work independently in a fully remote setting.
Nice to Have
- Experience with Kubernetes and bare-metal provisioning or HPC cluster management.
- Familiarity with tools such as Ansible, Warewulf, RKE2, Cilium, Kubeflow, Weka, iRODS, and Globus.
- Web application development experience in any stack.
- Experience in an academic or research computing environment.
What We Offer
- Competitive salary ranging from $90,430 to $135,640 annually.
- Fully remote work environment with flexible hours.
- Opportunity to work on innovative projects in AI and GPU-accelerated computing.
- Access to professional development resources and training opportunities.
- Collaborative team culture with a focus on research and innovation.
This position is grant-funded and employment is contingent upon the continued availability of those funds. We encourage you to apply even if you do not meet all the listed qualifications. Join us in shaping the future of research computing at Drexel University!
This remote DevOps Engineer position at Drexel University offers a unique opportunity to work on innovative GPU-accelerated computing projects. With a competitive salary and flexible work arrangements, it's an excellent fit for those passionate about AI and research computing.
Generating success profile...
Analyzing job requirements and market data
Loading market overview...
Analyzing market trends and skill demands
Industry News
Loading latest industry news...
Finding relevant articles from the last 6 months