Judgment Labs Logo

Judgment Labs

Senior Cloud Infrastructure Engineer

Reposted 20 Days Ago
In-Office
San Francisco, CA, USA
Senior level
In-Office
San Francisco, CA, USA
Senior level
Design, build, and operate multi-region, highly available cloud and self-hosted/BYOC deployment architectures across AWS, GCP, and Azure. Implement secure networking, compliance/data residency solutions, automated provisioning, and observability for distributed customer environments. Own infrastructure roadmap, reliability, and enterprise deployment lifecycle, including documentation and customer-facing implementation guides.
The summary above was generated by AI

Judgment Labs builds infrastructure for Agent Behavior Monitoring (ABM). While traditional observability focuses on logging exceptions and latency, our ABM surfaces behavioral anomalies such as instruction drifts and context retrieval loss in scaled production environments.
Hundreds of teams building autonomous agents rely on Judgment to understand how their systems are behaving post-deployment. Instead of reactive incident triage, they cluster patterns across conversations and workflows, correlate regressions to specific interaction types, and pinpoint where reliability breaks down in their usage context.
We’ve raised $30M+ across two rounds in the past five months. Our investors include Lightspeed, SV Angel, Valor Equity Partners, Nova Global, Chris Manning, Michael Ovitz, Michael Abbott, Cory Levy, Kevin Hartz, and others.
The Role:
We are looking for a Senior Cloud Infrastructure Engineer to architect and scale the deployment infrastructure that powers agent behavior monitoring at production scale. This role is crucial for enabling enterprise customers to run Judgment in their environments—whether that's multi-region cloud, self-hosted, or BYOC deployments—while maintaining the security, compliance, and reliability standards they require. We need someone who has built distributed systems that handle real production traffic and can own infrastructure from architecture through operations.
What You'll Do:
• Design and implement multi-region cloud architecture with automatic failover and disaster recovery across AWS, GCP, and Azure.
• Architect and deploy regional compliance solutions (data residency, sovereignty) for enterprise customers in different geographies.
• Design and implement self-hosted and BYOC (Bring Your Own Cloud) deployment architectures for enterprise customers with strict security requirements.
Build secure VPC peering and private connectivity solutions for customer-managed environments
Develop automated provisioning systems for on-premises and hybrid cloud deployments
Create customer-facing documentation and deployment guides for self-service infrastructure setup
• Design enterprise-grade security architectures including network isolation, encryption at rest and in transit, and identity management integration (SSO, SAML, SCIM).
• Build monitoring and observability solutions for distributed self-hosted deployments with centralized logging and alerting.
What We're Looking For:

  • 6+ years of experience designing and operating large-scale distributed systems in production.

  • Proven track record building infrastructure that supports mission-critical workloads with strict reliability guarantees.

  • Strong programming ability in at least one modern language (Go, Python, Typescript, Java, etc) with experience building production infrastructure systems.

  • Experience designing systems that handle high-throughput data pipelines, large-scale telemetry, or real-time services.

  • Deep understanding of failure modes in distributed systems and how to design resilient architectures.

  • Experience owning infrastructure systems from architecture through deployment and operations.

  • Ability to design infrastructure that remains operational under partial failures, degraded networks, and regional outages.

  • Strong experience writing technical design documents, architecture proposals, and operational runbooks.

Nice to have:

  • Experience with infrastructure patterns commonly used by companies like Datadog, Sentry, or Honeycomb:

    • operating large-scale telemetry ingestion pipelines

    • building high-throughput observability infrastructure

    • processing billions of events per day

  • Additional valuable experience:

    • private connectivity solutions (AWS PrivateLink, Azure Private Link, GCP Private Service Connect)

    • air-gapped or restricted network deployments

    • high-throughput streaming systems such as Kafka or RabbitMQ

    • multi-tenant SaaS infrastructure architectures

    • infrastructure supporting AI systems

Why Judgment?
Agents can’t work without this. Today’s agents hallucinate, drift, and break in production. We’re building the infrastructure that fixes this: the monitoring layer that makes agents self-improving.
We’re wired to win. We're a team of less than 20 but we ship like 50+ on the daily. You'll be working with olympiad medalists, debate champions, and competitive athletes who bring that same intensity to company building.
Fast track to founding. Our engineers interface directly with customers, ship code into their environments, and use their feedback to dictate what’s next on the roadmap. Everyone on the team is either an ex-founder or a founder-to-be.
We make sure our people do their best work. If you deserve a spot on the team, money will never get in the way of it. Full benefits, Equinox, and a private chef to take care of you. We sprint hard but we play hard, ask us about our Smash/Mario Kart tournaments.
We work in person in San Francisco.

HQ

Judgment Labs San Francisco, California, USA Office

425 Bush St, San Francisco, California, United States, 94108 3708

Similar Jobs

3 Days Ago
In-Office
South San Francisco, CA, USA
180K-240K Annually
Senior level
180K-240K Annually
Senior level
Aerospace • Hardware • Logistics • Robotics • Software • Transportation
Design, build, and manage infrastructure for cloud platforms; enhance reliability; collaborate across teams to meet operational needs.
Top Skills: AWSKafkaKubernetesLinuxNetworkingStorage
7 Days Ago
In-Office
Mountain View, CA, USA
174K-299K Annually
Senior level
174K-299K Annually
Senior level
eCommerce
The Senior Staff Backend Engineer will enhance cloud infrastructure reliability and efficiency, design scalable solutions, and oversee the deployment of cloud services, analyzing business problems to optimize performance and availability.
Top Skills: AnsibleAWSAzureCi/CdCircleCICloud FormationDockerGCPGitJenkinsKubernetesLinuxPythonTerraform
14 Days Ago
In-Office
Daly City, CA, USA
Senior level
Senior level
Healthtech • Information Technology • Professional Services • Consulting
The Senior Cloud Infrastructure Engineer will design, deploy, and manage AWS infrastructure focusing on enterprise networking, security, and reliability while supporting cloud migrations in a HIPAA-regulated environment.
Top Skills: AWSCiscoCloudFormationHipaaPalo AltoSd-WanTerraformVMware

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account