About the Role
Join Poolside as a Mid-Senior Data Engineer in this exciting remote position. You will play a crucial role in our Pretraining Data team, where your mission will be to build and scale our Model Factory. This hands-on role requires you to architect and maintain high-performance pipelines that transform trillions of raw tokens into the high-quality datasets that our models need. As a Data Engineer, you will directly influence model performance through superior data modeling and distributed pipeline optimization.
What You'll Do
- Build and maintain high-performance pipelines for processing trillions of tokens.
- Deliver diverse and high-quality datasets for pre-training foundation models.
- Collaborate with teams such as Pretraining, Posttraining, Evals, and Product to ensure alignment on model quality.
- Engineer ingestion, deduplication, and streaming systems that handle petabyte-scale data.
- Bridge the gap between raw web crawls and GPU clusters to enhance model performance.
Requirements
- Strong background in building production-grade, distributed data systems for machine learning.
- Experience with orchestration tools like Slurm, Airflow, or Dagster.
- Proficiency in CI/CD, Grafana, and Prometheus for observability and reliability.
- Expert-level knowledge of Python and ability to write clean, maintainable code.
- Familiarity with libraries like Polars, Dask, or PySpark.
Nice to Have
- Experience in building trillion-scale SOTA pretraining datasets.
- Experience translating research to production at scale.
- Familiarity with OCR, web crawling, or evals.
What We Offer
- Fully remote work with flexible hours.
- 37 days of vacation and holidays per year.
- Health insurance allowance for you and your dependents.
- Company-provided equipment and home office allowances.
- A diverse and inclusive people-first culture.
This Mid-Senior Data Engineer role at Poolside offers a unique opportunity to work remotely on cutting-edge AI projects with a generous vacation policy.
About Poolside
Explore exciting Poolside careers in 2026 with a variety of remote, hybrid, and office roles available. Our platform offers advanced filters to refine your job search, application tracking to keep your submissions organized, and valuable company insights to help you stand out. Discover your next career opportunity at Poolside today and take the next step toward a rewarding future in the industry.
Generating success profile...
Analyzing job requirements and market data
Loading market overview...
Analyzing market trends and skill demands
Industry News
Loading latest industry news...
Finding relevant articles from the last 6 months