agi, inc.12.04.26
AI SCORE 8.5

Applied Scientist - Evaluation & Model Behavior (Remote)

$120K–$150K/year

About the Role

We are seeking an Applied Scientist - Evaluation & Model Behavior to join our innovative team at AGI, Inc. This Applied Scientist remote position will focus on designing and implementing systems to measure and enhance the performance of our Computer Use Agents. Our mission is to build everyday AGI that redefines human–AI collaboration, and your role will be pivotal in achieving this goal.

What You'll Do

  • Model Behavior Design: Translate product requirements into technical specifications for model behavior. Engineer system prompts and few-shot examples to address specific capability gaps and behavioral failures.
  • Evaluation Design: Define metrics for reasoning, tool usage, and safety, and validate these metrics against human judgment to ensure statistical rigor.
  • Data Strategy: Design algorithms to filter, score, and select training data. Write Python scripts to sanitize inputs and manage the training data lifecycle from raw logs to high-quality datasets.
  • Failure Analysis: Investigate regressions in model benchmarks. Diagnose root causes, distinguishing between data quality issues, prompt instruction failures, or underlying model capability gaps and implement fixes.
  • Ground Truth Management: Define rubrics and guidelines for human annotation. Maintain reference datasets ("Golden Sets") to establish a consistent baseline for model performance evaluation.

Requirements

  • Master's degree or PhD in Computer Science, Data Science, Statistics, or a related technical field, or equivalent practical experience.
  • 3+ years of experience in Data Science, Machine Learning, or Applied Science.
  • Proficiency in Python, with experience writing production-quality code for data pipelines or evaluation harnesses.
  • Experience with experimental design, A/B testing, or statistical analysis.

Nice to Have

  • Experience with Large Language Models (LLMs), including prompt engineering, fine-tuning, or RLHF workflows.
  • Experience building automated evaluation systems or implementing model-based evaluation frameworks.
  • Ability to translate product requirements into measurable technical metrics.
  • Experience managing human-in-the-loop data pipelines or annotation quality control.

What We Offer

  • Competitive company-sponsored medical, dental, and vision insurance.
  • Top-tier relocation and immigration support.
  • Comprehensive relocation packages to help you move and settle in your new role.

If you are ready to take the next step in your Data Science career and want to be part of a team that is building the future of AI, apply now for this Applied Scientist remote position. We look forward to seeing how you can contribute to our mission!

Language Requirements
EnglishC1
BasicIntermediateAdvancedNative
Why This Job8.5 of 10

Join a cutting-edge AI company as an Applied Scientist, focusing on model evaluation and behavior. Enjoy competitive benefits and relocation support.

Salary Range
Required
0/1
Optional
0/1
Bonus
0/1

Who Will Succeed Here

Proficient in Python and experienced with frameworks such as TensorFlow or PyTorch for developing machine learning models, with a strong understanding of LLM (Large Language Models) to optimize model behavior.

Self-motivated and disciplined, thriving in a remote work environment by effectively managing time and collaborating asynchronously with cross-functional teams to achieve project milestones.

Analytical mindset with hands-on experience in AB Testing and Statistical Analysis, capable of interpreting complex data sets to refine model performance and validate improvements.

Learning Resources

Python for Data Science Handbookguide

Career Path

Applied Scientist - Evaluation & Model Behavior(Now)Senior Applied Scientist(1-2 years)Lead Data Scientist / AI Researcher(3-5 years)

Market Overview

Market Size 2024
$16.6B
Annual Growth
11.5%
AI Adoption
79%
Investment in AI Solutions
+45%
Labour Demand for Data Scientists
+30%
Avg Salary for Applied Scientists
$120K

Skills & Requirements

Required
PythonMachine LearningData Science
Growing in Demand
Deep LearningNatural Language Processing (NLP)Cloud Computing (AWS, Azure)
Declining
R ProgrammingMATLAB

Domain Trends

Increased Focus on Ethical AI
Over 60% of organizations are prioritizing ethical considerations in AI development, leading to a rise in demand for professionals skilled in ethical AI practices.
Integration of LLMs in Business Solutions
The use of Large Language Models (LLMs) has increased by 90% in enterprise applications, driving demand for specialists who can evaluate and optimize these models.
Shift Towards Automated Machine Learning (AutoML)
By 2025, 70% of machine learning models will be developed using AutoML tools, reducing reliance on traditional coding skills and increasing the need for data scientists who can interpret and validate AutoML outputs.

Industry News

Loading latest industry news...

Finding relevant articles from the last 6 months

All job postings are automatically gathered by algorithms. We do not review or verify listings, be careful when applying and do not sign-in with iCloud or Google services.