Applied Scientist - Evaluation & Model Behavior (Remote)
About the Role
We are seeking an Applied Scientist - Evaluation & Model Behavior to join our innovative team at AGI, Inc. This Applied Scientist remote position will focus on designing and implementing systems to measure and enhance the performance of our Computer Use Agents. Our mission is to build everyday AGI that redefines human–AI collaboration, and your role will be pivotal in achieving this goal.
What You'll Do
- Model Behavior Design: Translate product requirements into technical specifications for model behavior. Engineer system prompts and few-shot examples to address specific capability gaps and behavioral failures.
- Evaluation Design: Define metrics for reasoning, tool usage, and safety, and validate these metrics against human judgment to ensure statistical rigor.
- Data Strategy: Design algorithms to filter, score, and select training data. Write Python scripts to sanitize inputs and manage the training data lifecycle from raw logs to high-quality datasets.
- Failure Analysis: Investigate regressions in model benchmarks. Diagnose root causes, distinguishing between data quality issues, prompt instruction failures, or underlying model capability gaps and implement fixes.
- Ground Truth Management: Define rubrics and guidelines for human annotation. Maintain reference datasets ("Golden Sets") to establish a consistent baseline for model performance evaluation.
Requirements
- Master's degree or PhD in Computer Science, Data Science, Statistics, or a related technical field, or equivalent practical experience.
- 3+ years of experience in Data Science, Machine Learning, or Applied Science.
- Proficiency in Python, with experience writing production-quality code for data pipelines or evaluation harnesses.
- Experience with experimental design, A/B testing, or statistical analysis.
Nice to Have
- Experience with Large Language Models (LLMs), including prompt engineering, fine-tuning, or RLHF workflows.
- Experience building automated evaluation systems or implementing model-based evaluation frameworks.
- Ability to translate product requirements into measurable technical metrics.
- Experience managing human-in-the-loop data pipelines or annotation quality control.
What We Offer
- Competitive company-sponsored medical, dental, and vision insurance.
- Top-tier relocation and immigration support.
- Comprehensive relocation packages to help you move and settle in your new role.
If you are ready to take the next step in your Data Science career and want to be part of a team that is building the future of AI, apply now for this Applied Scientist remote position. We look forward to seeing how you can contribute to our mission!
Join a cutting-edge AI company as an Applied Scientist, focusing on model evaluation and behavior. Enjoy competitive benefits and relocation support.
Who Will Succeed Here
Proficient in Python and experienced with frameworks such as TensorFlow or PyTorch for developing machine learning models, with a strong understanding of LLM (Large Language Models) to optimize model behavior.
Self-motivated and disciplined, thriving in a remote work environment by effectively managing time and collaborating asynchronously with cross-functional teams to achieve project milestones.
Analytical mindset with hands-on experience in AB Testing and Statistical Analysis, capable of interpreting complex data sets to refine model performance and validate improvements.
Learning Resources
Career Path
Market Overview
Skills & Requirements
Domain Trends
Industry News
Loading latest industry news...
Finding relevant articles from the last 6 months