AI SCORE 8.5

Senior Engineer - Distributed Systems & ML Large-Scale Training

$140K–$180K/year

About the Role

We're hiring a Senior Engineer specializing in Distributed Systems and ML Large-Scale Training to join our innovative team at Pluralis Research. In this remote role, you will play a critical part in implementing a novel substrate for training distributed ML models that operate efficiently under consumer-grade internet conditions. Your expertise will help shape the future of community-trained models, ensuring they are robust and self-sustaining.

What You'll Do

  • Design and implement large-scale distributed training systems optimized for heterogeneous hardware under low-bandwidth, high-latency conditions.
  • Develop and optimize model-parallel training strategies using custom sharding techniques to minimize communication overhead.
  • Optimize GPU utilization, memory efficiency, and compute performance across distributed nodes.
  • Implement robust checkpointing, state synchronization, and recovery mechanisms for long-running, fault-prone training jobs.
  • Build monitoring and metrics systems to track training progress, model quality, and identify system bottlenecks.
  • Architect resilient training systems capable of handling node failures and network partitions.
  • Design peer-to-peer topologies for decentralized coordination across non-co-located nodes.
  • Profile and optimize communication patterns to reduce latency and bandwidth overhead in multi-participant environments.

Requirements

  • 5+ years of strong experience building and operating distributed systems in production.
  • Hands-on expertise with distributed training frameworks such as FSDP, DeepSpeed, or Megatron.
  • Deep understanding of model parallelism techniques including data, tensor, and pipeline parallelism.
  • Expert-level proficiency in Python with production experience in concurrency, error handling, and clean architecture.
  • Strong networking fundamentals including P2P systems, gRPC, routing, and NAT traversal.
  • Experience optimizing GPU workloads, memory management, and large-scale compute efficiency.

Nice to Have

  • Familiarity with cloud platforms and services.
  • Experience with containerization technologies like Docker.
  • Knowledge of machine learning frameworks such as TensorFlow or PyTorch.

What We Offer

  • Equity-heavy compensation with meaningful ownership in a mission-driven company.
  • Competitive base salary for senior engineering roles in the United States.
  • Visa sponsorship available for exceptional candidates.
  • Remote-first work environment with optional access to our Melbourne hub.
  • Join a world-class team with members previously at Google, Amazon, Microsoft, and leading startups.
  • Be part of a company backed by Union Square Ventures and other tier-1 investors.
Why This Job8.5 of 10

This Senior Engineer role at Pluralis Research offers a unique opportunity to work on cutting-edge distributed systems and machine learning projects. With a competitive salary and equity options, it's an attractive position for experienced professionals.

Salary Range
Required
0/1
Optional
0/1
Bonus
0/1

Generating success profile...

Analyzing job requirements and market data

Loading market overview...

Analyzing market trends and skill demands

Industry News

Loading latest industry news...

Finding relevant articles from the last 6 months

All job postings are automatically gathered by algorithms. We do not review or verify listings, be careful when applying and do not sign-in with iCloud or Google services.