Poolside13.04.26
AI SCORE 8.5

Senior Data Scientist - Pretraining Datasets (Remote)

$120K–$150K/year

About the Role

We are seeking a Senior Data Scientist - Pretraining Datasets to join our innovative team at Poolside. This remote role focuses on enhancing the quality of datasets for training our models. Your primary mission will be to improve the quality of pretraining datasets through your expertise and hands-on experience.

What You'll Do

  • Lead initiatives to enhance the quality of pretraining datasets using your background in machine learning and engineering.
  • Design and implement complex data pipelines that generate large volumes of diverse datasets while optimizing resource usage.
  • Collaborate closely with teams such as Pretraining, Posttraining, Evals, and Product to ensure high-quality model outputs.
  • Stay updated on the latest research in dataset design and pretraining to inform your work.
  • Conduct original research initiatives through time-bounded experiments to drive improvements in data quality.
  • Utilize a performant distributed data pipeline and large GPU cluster to manage massive data volumes effectively.

Requirements

  • Strong background in machine learning and engineering, particularly with Large Language Models (LLMs).
  • Experience with transformer architectures and understanding how LLMs learn.
  • Familiarity with data ablations, scaling laws, and mid-training/post-training techniques.
  • Proficient programming skills in Python and strong prompt engineering abilities.
  • Experience working with large-scale GPU clusters and distributed data pipelines.
  • Research experience, including authorship of scientific papers on relevant topics, is a plus.

Nice to Have

  • Experience in building trillion-scale pretraining datasets.
  • Knowledge of data curation, deduplication, data mixing, and tokenization.
  • Ability to discuss the latest research papers and engage in detailed technical discussions.

What We Offer

  • Fully remote work with flexible hours.
  • 37 days of vacation and holidays per year.
  • Health insurance allowance for you and your dependents.
  • Company-provided equipment and home office allowances.
  • A culture that prioritizes well-being, continuous learning, and inclusivity.
Why This Job8.5 of 10

This Senior Data Scientist role at Poolside offers a unique opportunity to work on cutting-edge AI projects while enjoying remote flexibility and generous benefits.

Salary Range
Required
0/1
Optional
0/1
Bonus
0/1

About Poolside

Explore exciting Poolside careers in 2026 with a variety of remote, hybrid, and office roles available. Our platform offers advanced filters to refine your job search, application tracking to keep your submissions organized, and valuable company insights to help you stand out. Discover your next career opportunity at Poolside today and take the next step toward a rewarding future in the industry.

Industry
Tech
Location
Remote

Generating success profile...

Analyzing job requirements and market data

Loading market overview...

Analyzing market trends and skill demands

Industry News

Loading latest industry news...

Finding relevant articles from the last 6 months

All job postings are automatically gathered by algorithms. We do not review or verify listings, be careful when applying and do not sign-in with iCloud or Google services.