Judgment Labs Jobs

Research Engineer

Judgment Labs

Research Engineer

Reposted 19 Days Ago

Be an Early Applicant

In-Office

San Francisco, CA, USA

Mid level

In-Office

San Francisco, CA, USA

Mid level

Build scalable systems to aggregate, index, and analyze large-scale agent interaction data; develop agent-based evaluation systems; design post-training and optimization workflows; build internal tools and infrastructure for experimentation, analysis, and training; ship research into production and collaborate cross-functionally to improve agent behavior.

The summary above was generated by AI

Judgment Labs builds infrastructure for Agent Behavior Monitoring (ABM). While traditional observability focuses on logging exceptions and latency, our ABM surfaces behavioral anomalies such as instruction drifts and context retrieval loss in scaled production environments.

Hundreds of teams building autonomous agents rely on Judgment to understand how their systems are behaving post-deployment. Instead of reactive incident triage, they cluster patterns across conversations and workflows, correlate regressions to specific interaction types, and pinpoint where reliability breaks down in their usage context.

We’ve raised $30M+ across two rounds in the past five months. Our investors include Lightspeed, SV Angel, Valor Equity Partners, Nova Global, Chris Manning, Michael Ovitz, Michael Abbott, Cory Levy, Kevin Hartz, and others.

The Role:

We are looking for Research Engineers to build AI systems that use agent interaction data to help us understand how agents behave, evaluate them at scale, and improve them through learning and feedback.

Your research will not live on a whiteboard. You'll work directly with real-world agent data, apply frontier methods in production, and see your work ship immediately into the product. By making agent behavior measurable and debuggable, your systems will support teams deploying agents across finance, legal, operations, and other high-stakes workflows. You will own projects end-to-end, with significant autonomy, and work closely with the team to build self-improving agent systems.

What You'll Do:

Build systems to aggregate, index, and analyze large-scale agent interaction data to extract meaningful evaluation signals
Develop agent-based systems for analyzing and evaluating complex, long-running behaviors
Design and implement post-training and optimization workflows to improve agent behavior
Build internal tools and infrastructure to support rapid experimentation, analysis, and training

What We're Looking For:

You identify with at least one of the following:

You care about data quality, evaluation, and benchmarking, and are comfortable working hands-on with messy data
You have experience building agent systems and working with them in real-world or production settings
You have a strong background in reinforcement learning, agents, or machine learning fundamentals
You are comfortable working across infrastructure and systems, spanning training, data pipelines, and model serving.
You are comfortable working across teams to translate research into product, balancing real-world customer constraints and tradeoffs.
You enjoy turning ambiguous problems into clear, well-designed plans

Why Judgment?

Agents can’t work without this. Today’s agents hallucinate, drift, and break in production. We’re building the infrastructure that fixes this: the monitoring layer that makes agents self-improving.
We’re wired to win. We're a team of less than 20 but we ship like 50+ on the daily. You'll be working with olympiad medalists, debate champions, and competitive athletes who bring that same intensity to company building.
Fast track to founding. Our engineers interface directly with customers, ship code into their environments, and use their feedback to dictate what’s next on the roadmap. Everyone on the team is either an ex-founder or a founder-to-be.
We make sure our people do their best work. If you deserve a spot on the team, money will never get in the way of it. Full benefits, Equinox, and a private chef to take care of you. We sprint hard but we play hard, ask us about our Smash/Mario Kart tournaments.
We work in person in San Francisco.

425 Bush St, San Francisco, California, United States, 94108 3708

Similar Jobs

CoreWeave

Staff Applied Research Engineer

8 Days Ago

In-Office

Sunnyvale, CA, USA

207K-275K Annually

Senior level

207K-275K Annually

Senior level

Cloud • Information Technology • Machine Learning

Lead applied research to advance continuous learning for agents: design and evaluate LLM post‑training and RL methods, implement and deploy experiments at scale, validate research on customer tasks, optimize GPU/distributed training, and mentor engineers while driving cross‑functional technical direction.

Top Skills: CudaFastapiGpuKubernetesMegatronPostgresTemporal

CoreWeave

Senior Applied Research Engineer

8 Days Ago

In-Office

Sunnyvale, CA, USA

182K-242K Annually

Senior level

182K-242K Annually

Senior level

Cloud • Information Technology • Machine Learning

Drive applied research to enable continuous learning for agents: design and evaluate LLM post-training methods (fine-tuning, RL, distillation), run large-scale GPU experiments, validate approaches on customer tasks, and help deploy and scale models and infrastructure across the stack.

Top Skills: CudaDistributed TrainingFastapiGpusJaxKubernetesLlmsMegatronPostgresPythonPyTorchReinforcement LearningTemporal

Sprinter Health

Software Engineer

18 Days Ago

Hybrid

San Francisco, CA, USA

160K-200K Annually

Senior level

160K-200K Annually

Senior level

Artificial Intelligence • Healthtech • Logistics • Social Impact • Software • Telehealth

Design and implement large-scale routing, scheduling, forecasting, and simulation systems to optimize clinician dispatch and capacity. Prototype and productionize optimization and predictive models, collaborate with product and ops, and own end-to-end projects in a distributed environment.

Top Skills: AppsyncAws AmplifyBigQueryCloudFormationDistributed SchedulingDynamoDBElasticsearchForecasting FrameworksGraphQLJavaScriptKibanaLambdaLookerMonte Carlo SimulationNode.jsOpensearchOptimization FrameworksPythonRoute AnnealingSimulation FrameworksTypescript

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Judgment Labs

Research Engineer

Judgment Labs San Francisco, California, USA Office

Similar Jobs

Staff Applied Research Engineer

Senior Applied Research Engineer

Software Engineer

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech