Bespoke Labs Jobs

Backend Engineer

Bespoke Labs

Backend Engineer

Reposted 15 Hours Ago

Hybrid

Mountain View, CA, USA

Mid level

Hybrid

Mountain View, CA, USA

Mid level

Design and maintain the infrastructure for RL environments, focusing on execution, performance optimization, and production excellence while collaborating with research teams and clients.

The summary above was generated by AI

About Bespoke Labs

Bespoke Labs is an applied AI research lab pioneering data and RL environment curation for training and evaluating agents.

Recently, we curated Open Thoughts, one of the best open reasoning datasets used by multiple frontier labs, trained SOTA specialized models such as Bespoke-MiniChart-7B and Bespoke-MiniCheck, and built the environment infrastructure that frontier labs and enterprises use to make their agents reliable.

Bespoke is uniquely positioned to capture a large share of data and RL environment curation.

About the Role

We're looking for an Infrastructure Engineer to own the execution layer beneath our RL environments: the systems that let an agent operate inside a realistic, multi-tool world coherently for hours or days.

This is a hard systems problem disguised as an AI job. As the tasks agents can complete keep lengthening, the environments that train them have to stay coherent across far longer horizons than anything that exists today. That means sandboxing and isolation you can trust, execution that's fast and cheap enough to run at training scale, and the ability to snapshot, restore, inspect, and branch a running environment instead of treating every rollout as one-shot. You'll build the platform that makes all of this possible.

You'll work closely with our research and data teams, and directly with frontier labs and enterprise customers, to turn environment designs into infrastructure that runs reliably in production.

What You'll Do

Environment Execution & Sandboxing:
- Design and own the sandboxing and execution layer that environments run inside. Build systems to snapshot and restore environment state (disk, process, and where relevant memory and accelerator state) so runs can be paused, resumed, inspected, and branched rather than executed once.
- Develop the machinery to detect failure modes early in a rollout (reward hacks, infra faults, fairness issues) and to revert to a known-good state, patch, and continue.
- Extend execution to long-horizon and multi-node environments, where an agent operates across many tools and services over hours or days.
Performance & Scale
- Own the performance characteristics of the platform: throughput, latency, and cost-per-rollout at scale.
- Drive utilization and scheduling so we can run far more environment rollouts per dollar without sacrificing reliability.
- Profile and remove bottlenecks across the stack, from container startup to environment teardown.
- Build the observability that lets us understand what's happening inside thousands of concurrent, long-running rollouts.
Environment Platform
- Build and maintain the framework for specifying, packaging, and deploying RL environments which is used by both humans and agents authoring environments internally.
- Create the tooling that lets researchers and environment authors debug a specific failure across hundreds of long agent traces.
Collaboration & Production Excellence
- Scale prototypes into production systems with reproducible workflows and high engineering standards.
- Write the documentation and tools that let internal teams and external users build on the platform.

What We're Looking For

Systems & Infrastructure
- Strong track record building production systems or research infrastructure at scale: distributed systems, execution engines, container/sandboxing infrastructure, or similar.
- Deep comfort with the systems layer: containers and isolation (e.g. namespaces, cgroups, VMs, gVisor/Firecracker-style sandboxing), filesystems, process and state management.
- Experience making systems fast and cheap — profiling, scheduling, resource utilization, and cost optimization at scale.
- Proficiency with cloud platforms (GCP, AWS) and distributed computing.
- Strong engineering fundamentals and a systematic approach to testing, validation, and reliability.
Execution & Ownership
- Comfort operating in ambiguity.
- Strong Python skills; comfort in a systems language (Rust, Go, or C++) is a plus.
- Ability to use modern tools such as Claude Code effectively.
Collaboration & Communication
- Excellent communication skills for working with research teams and enterprise customers.
- Ability to translate between research needs and infrastructure requirements.
- Comfortable presenting technical work to diverse audiences.

Nice to Have

Experience with RL training or evaluation infrastructure, or the execution layer for agent rollouts.

Experience with checkpoint/snapshot-restore systems, CRIU, or distributed state management.

Background in high-throughput, low-latency execution systems.

Contributions to widely-used infrastructure, datasets, benchmarks, or open-source systems.

Previous experience in a research engineering or infrastructure role at an AI or systems-heavy company.

Logistics

Location: Mountain View, CA

Compensation: Competitive salary and equity

Benefits: Health coverage, and the opportunity to work directly with the world's leading AI research labs

800 W El Camino Real, Mountain View, California, United States, 94040

Similar Jobs

Xero

Back-end Engineer

14 Days Ago

Hybrid

San Mateo, CA, USA

207K-258K Annually

Senior level

207K-258K Annually

Senior level

Cloud • Fintech • Information Technology • Machine Learning • Software

Lead design and build a greenfield, high-throughput payments platform. Drive architecture, implement distributed backend services, scale cloud-native systems, integrate observability and AI tooling, own APIs, mentor engineers, and support production through CI/CD and on-call responsibilities.

Top Skills: Ai-Native ToolingApi-FirstC#Ci/CdCloud-Native PlatformsGoHeadlessJavaNoSQLObservability ToolsRelational Databases

Attain

Back-end Engineer

14 Days Ago

Easy Apply

In-Office

Redwood City, CA, USA

Easy Apply

Senior level

AdTech

Lead backend engineering for a mobile-first, backend-driven consumer app. Design, build, and own Rust-based microservices, APIs, feature-flagging, and UX configuration; partner with iOS/Android teams, provide architecture guidance, and drive delivery from design through production.

Top Skills: AndroidAWSAzureCrm)GCPGraphQLGrpciOSProtocol BuffersReact NativeRelational DatabasesRustScanningSQLThird-Party Sdks (Analytics

Benchling

Software Engineer

4 Days Ago

Hybrid

San Francisco, CA, USA

148K-200K Hourly

Junior

148K-200K Hourly

Junior

Cloud • Healthtech • Social Impact • Software • Biotech

Build and maintain release engineering tooling to streamline development, testing, packaging, and production releases. Improve developer experience via feedback-driven iterations, collaborate with cross-functional teams, scale tooling and processes, and contribute to engineering best practices and hiring.

Top Skills: Ci/CdNode.jsPackagingPythonTestingWeb Frameworks

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Bespoke Labs

Backend Engineer

Bespoke Labs Mountain View, California, USA Office

Similar Jobs

Back-end Engineer

Back-end Engineer

Software Engineer

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech