Bespoke Labs Jobs

RL Environments

Bespoke Labs

RL Environments

Reposted 15 Hours Ago

Be an Early Applicant

Hybrid

Mountain View, CA, USA

Mid level

Hybrid

Mountain View, CA, USA

Mid level

Develop systematic strategies for creating RL environments, analyze agent behavior and failures, validate and package benchmark environments for external use.

The summary above was generated by AI

About Bespoke Labs

Bespoke Labs is an applied AI research lab pioneering data and RL environment curation for training and evaluating agents.

Recently, we curated Open Thoughts, one of the best open reasoning datasets used by multiple frontier labs, trained SOTA specialized models such as Bespoke-MiniChart-7B and Bespoke-MiniCheck, and taught agents to do multi-turn tool-calling with reinforcement learning.

Bespoke is uniquely positioned to capture a large market share of data and RL environment curation.

About The Role

We're looking for an RL Environment Research Engineer to accelerate how we create, evaluate, and benchmark training environments for AI agents. You'll develop systematic approaches to environment design, identify where agents fail, and turn those insights into high-quality training data and benchmarks.

This role combines research intuition with practical execution. You'll need to understand agent behavior deeply—spotting reward hacking, analyzing failure modes, and diagnosing why certain environments produce better training outcomes. Then you'll translate that understanding into repeatable processes and benchmark suites that we can showcase externally.

You're someone who enjoys both the detective work (analyzing agent rollouts, finding patterns in failures) and the building work (designing environments, creating evaluation pipelines). You can move between studying the science of what makes environments effective and actually producing those environments at scale.

What You'll Do

Develop systematic strategies and recipes for creating high-quality RL environments that effectively train and evaluate agents.
Study how LLMs and agents fail across different task types, identifying patterns that inform better environment design.
Create benchmark environments that test specific agent capabilities, packaging them for external release on our evaluation platform.
Verify environment quality through hands-on testing—training small-scale agents, checking for reward hacking, and analyzing training dynamics.
Work with our environment creation pipeline to scale production of validated environments.
Analyze agent rollout data to uncover insights about what makes environments challenging, diverse, and pedagogically valuable.
Collaborate with the team to ensure benchmarks integrate smoothly into our external-facing dashboards.
Establish quality standards and evaluation protocols that maintain high bars as we scale environment production.

What We're Looking For

Research and analytical skills:

Strong foundation in machine learning—either through a PhD/MS in ML, CS, or equivalent industry experience.
Deep curiosity about agent behavior and failure modes, with ability to form hypotheses and test them systematically.
Experience analyzing complex systems and extracting actionable insights from data.
Patience and attention to detail for studying agent rollouts and identifying subtle patterns.

Technical execution:

Proficiency in Python and ML frameworks (PyTorch, JAX, or similar).
Experience with RL concepts and agent training, even if not from a RL background.
Ability to design experiments, run training loops, and interpret results.
Comfortable working with cloud platforms (GCP, AWS) for running experiments at scale.

Practical engineering:

Can build pipelines and automation to scale research insights into production.
Experience with data analysis tools and creating reproducible workflows.
Systematic approach to quality verification and testing.

Nice to Have

Hands-on experience with reinforcement learning or agent training systems
Background in data curation, dataset creation, or evaluation benchmark design
Experience with AI safety, robustness testing, or adversarial evaluation
Publications or projects related to RL, agent evaluation, or data-centric AI
Understanding of how to design environments that surface specific failure modes
Experience shipping research artifacts (datasets, benchmarks, evaluation suites) to the community

Logistics

Location: Mountain View, CA.

Compensation: Competitive salary and equity based on experience and background

Benefits: Health coverage, flexible work arrangements, and the opportunity to shape how the AI community evaluates and trains agents

We encourage applications from candidates with diverse research backgrounds. If you're passionate about understanding agent behavior and creating systematic approaches to environment design, we'd love to hear from you.

800 W El Camino Real, Mountain View, California, United States, 94040

Similar Jobs

Handshake

Director, Applied Machine Learning

13 Days Ago

In-Office

San Francisco, CA, USA

375K-400K Annually

Senior level

375K-400K Annually

Senior level

Edtech • Enterprise Web • HR Tech • Software

Lead a team bridging frontier AI research and enterprise delivery to design post-training and RL environments, scale post-training infrastructure, support strategic lab projects, manage client relationships, and grow engineering and research talent while shaping technical roadmap.

Top Skills: AgentsAnnotation ToolingEvaluation FrameworksHuman Feedback CollectionPost-TrainingPpoReinforcement LearningRl EnvironmentsSft

Anthropic

Research Engineer, Computer Use

Yesterday

In-Office

San Francisco, CA, USA

500K-850K Annually

Expert/Leader

500K-850K Annually

Expert/Leader

Artificial Intelligence • Natural Language Processing • Generative AI

Design and run experiments to improve models' perception and agentic capabilities for using software. Build RL training environments, evaluation frameworks, and validation pipelines. Collaborate with model training, infrastructure, and product teams to deploy research into production and measure model performance on complex computer tasks.

Top Skills: Computer VisionLarge-Scale Ml InfrastructureMachine LearningMultimodal ModelsPythonReinforcement LearningRl EnvironmentsSimulation

Databricks

Staff Research Engineer, Data Agents

4 Days Ago

In-Office

San Francisco, CA, USA

190K-270K Annually

Junior

190K-270K Annually

Junior

Big Data • Machine Learning • Software • Analytics • Big Data Analytics

Develop post-training recipes and systems for enterprise data agents capable of autonomous planning, code generation, and multi-step workflows. Partner with product teams to turn prototypes into production features for Databricks' Genie, and build context-discovery systems that use lakehouse data, notebooks, and code. Provide design, execution, debugging, and mentorship to raise team technical standards.

Top Skills: Agentic RlAgentsGenieLakehouseLlmsModel TrainingNotebooksPost-Training WorkflowsReinforcement LearningRl Environments

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Bespoke Labs

RL Environments

Bespoke Labs Mountain View, California, USA Office

Similar Jobs

Director, Applied Machine Learning

Research Engineer, Computer Use

Staff Research Engineer, Data Agents

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech