Judgment Labs Jobs

Senior Infrastructure Engineer

Judgment Labs

Senior Infrastructure Engineer

Reposted 19 Days Ago

In-Office

San Francisco, CA, USA

Senior level

In-Office

San Francisco, CA, USA

Senior level

Design, build, and operate multi-region, highly available cloud and self-hosted/BYOC deployment architectures across AWS, GCP, and Azure. Implement secure networking, compliance/data residency solutions, automated provisioning, and observability for distributed customer environments. Own infrastructure roadmap, reliability, and enterprise deployment lifecycle, including documentation and customer-facing implementation guides.

The summary above was generated by AI

Senior Cloud Infrastructure Engineer

San Francisco · On Site · Full Time

Judgment Labs is building the infrastructure for continual learning in long-horizon AI agents.

The next generation of agents will not improve from prompts alone. They will improve from experience: the tasks they attempt, the tools they use, the mistakes they make, the edge cases they encounter, and the outcomes they produce in production. The hard part is turning that raw experience into high-quality data that can actually improve the system.

Judgment builds the infrastructure to do that. We turn long agent trajectories into clean, structured data for evals, labeling, rubric generation, context engineering, and RL workflows. Instead of only showing teams what happened, Judgment helps decide what matters, what should be learned from, and how that learning should flow back into the agent.

Databricks built the data infrastructure for analytics. Judgment is building the learning infrastructure for agents.

We’ve raised $30M+ from Lightspeed, SV Angel, Valor Equity Partners, and others.

The Role

We’re looking for a Senior Cloud Infrastructure Engineer to own the infrastructure that lets Judgment run reliably across our cloud, customer environments, and enterprise deployments.

This role focuses on cloud/platform infrastructure: Terraform, EKS, ArgoCD/Kargo, IAM, DNS, observability, CI/CD, multi-region architecture, BYOC, self-hosted deployments, private connectivity, and enterprise-grade reliability.

You’ll work on the systems that keep high-throughput telemetry ingestion, ClickHouse, RabbitMQ, Temporal, evaluation workers, and customer-facing services running under real production load. You’ll also make Judgment deployable for customers with strict security and infrastructure requirements: multi-region, data residency, private networking, self-hosted, air-gapped, and Bring Your Own Cloud environments.

Interesting Technical Challenges

Enterprise-grade deployment architecture. Run Judgment in customer environments — self-hosted, air-gapped, or BYOC — while keeping operations, upgrades, observability, and reliability sane.
Multi-region reliability. Design failover, disaster recovery, data residency, and deployment patterns for customers that cannot tolerate downtime or ambiguous data movement.
Infrastructure for high-throughput telemetry. Support ingestion systems parsing and persisting hundreds of thousands of spans per second, with graceful backpressure and clear failure modes.
Operating stateful systems at scale. Keep ClickHouse, RabbitMQ, Temporal, evaluation workers, and supporting services healthy as workloads grow and customer traffic becomes spiky.
Private and secure connectivity. Build secure paths into customer environments using network isolation, IAM, SSO/SAML/SCIM, encryption, private connectivity, and restricted-network deployment patterns.
A single operational story across many deployment modes. Cloud, multi-region, BYOC, and self-hosted deployments should not become four totally different products to operate.
Safe production rollouts. Build deployment automation, environment parity, feature-flag discipline, CI/e2e reliability, monitoring, and rollback mechanisms so the team can move fast without breaking customer trust.

What You’ll Do

Own cloud infrastructure for production services across Terraform, EKS, ArgoCD/Kargo, IAM, DNS, networking, metrics, CI/CD, and deployment automation.
Build and operate infrastructure for trace ingestion, evaluation workers, RabbitMQ, Temporal, ClickHouse, and the systems that support Judgment’s core product.
Design multi-region and enterprise deployment architectures, including data residency, automatic failover, disaster recovery, and customer-managed environments.
Build secure deployment patterns for BYOC, self-hosted, private-network, and restricted environments.
Implement private connectivity, identity integrations, network isolation, encryption patterns, and enterprise security requirements.
Improve observability, alerting, runbooks, incident response, and operational tooling so the team can debug root causes quickly rather than chase symptoms.
Partner with backend engineers on reliability, scaling limits, queue behavior, storage growth, ingestion throughput, and production incidents.
Make deployments safer and faster through automation, rollout strategies, environment parity, CI reliability, e2e test health, and better internal tooling.
Work directly with customers when deployment, networking, security, or production environment constraints are the blocker.
Raise the bar for infrastructure quality through design docs, code reviews, operational rigor, and clean abstractions.

What We’re Looking For

Strong experience designing, building, and operating production cloud infrastructure for real customer-facing systems.
Deep understanding of distributed systems failure modes, especially around stateful services, queues, networking, storage, degraded networks, partial outages, and regional failures.
Strong programming ability in a modern language and a bias toward automating repeated operational work.
Experience with Kubernetes / EKS or similar orchestration systems, infrastructure-as-code, CI/CD, cloud networking, IAM, DNS, and production observability.
Ability to reason about reliability, security, deployment ergonomics, and developer velocity at the same time.
Experience owning infrastructure systems from design through implementation, rollout, incident response, and long-term maintenance.
Comfort working directly with customers on enterprise deployment, networking, compliance, or security constraints.
Clear written communication. You can write architecture proposals, operational runbooks, incident notes, and crisp tradeoff docs.

Nice to Have

Experience with Terraform, EKS, ArgoCD, Kargo, AWS networking, IAM, DNS, and production metrics/logging systems.
Experience operating ClickHouse, RabbitMQ, Temporal, Kafka, or other stateful production infrastructure.
Experience with private connectivity such as AWS PrivateLink, Azure Private Link, or GCP Private Service Connect.
Experience building BYOC, self-hosted, air-gapped, hybrid-cloud, or enterprise SaaS deployment models.
Experience with SSO, SAML, SCIM, secrets management, encryption, network isolation, and enterprise security reviews.
Experience with observability infrastructure, telemetry ingestion, or platforms like Datadog, Honeycomb, Sentry, or similar systems.
Experience supporting AI infrastructure, LLM evaluation workloads, or high-throughput event pipelines.

Why Judgment?

We’re building the learning infrastructure for agents. As agents move from demos to production, the bottleneck is no longer just better prompts. It is turning real production experience into high-quality data for evals, labeling, rubric generation, context engineering, and RL workflows.
Infrastructure is a product requirement here. Customers need Judgment to run reliably across our cloud, enterprise environments, and customer-managed deployments. Deployment quality directly affects whether they can use us.
The systems are real. High-throughput ingestion, stateful services, workflow orchestration, ClickHouse, LLM scoring, multi-region reliability, and BYOC all show up early.
This is a Databricks-scale infrastructure opportunity. Databricks built the data infrastructure for analytics. Judgment is building the learning infrastructure for agents.
You’ll have broad ownership. This is a small team, so infrastructure engineers own architecture, implementation, operations, and customer deployment outcomes.
In person in San Francisco. We work together in person because the problems are hard, the product is moving fast, and the feedback loops matter.

425 Bush St, San Francisco, California, United States, 94108 3708

Similar Jobs

ServiceNow

Infrastructure Engineer

Yesterday

Hybrid

Mountain View, CA, USA

143K-243K Annually

Senior level

143K-243K Annually

Senior level

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation

Design, build, and operate scalable agentic search infrastructure including indexing engines, ingestion/enrichment pipelines, multi-modal and vector indices, knowledge-graph entity resolution, and cluster reliability to enable low-latency, permission-aware retrieval and ranking across enterprise data.

Top Skills: ElasticsearchLuceneOpensearchSolrVector Databases

CoreWeave

Senior Software Engineer

Yesterday

In-Office

Sunnyvale, CA, USA

182K-242K Annually

Senior level

182K-242K Annually

Senior level

Cloud • Information Technology • Machine Learning

Lead incident response and RCA for bare-metal infrastructure services, improve system observability and automation (Prometheus/Grafana), build CI/CD and Kubernetes deployments, collaborate on hardware automation (Redfish), drive reliability KPIs, reduce on-call load, and document operational workflows across the server hardware lifecycle.

Top Skills: AWSCi/CdContainerizationGCPGoGrafanaKubernetesPrometheusRedfish

JPMorganChase

Infrastructure Engineer

2 Days Ago

Hybrid

San Francisco, CA, USA

Senior level

Financial Services

Design, build, automate, and operate enterprise-scale Microsoft Hyper-V virtualization across global data centers. Lead architecture, performance benchmarking, full-stack troubleshooting, automation (PowerShell, DSC, APIs), C#/.NET platform development, security hardening, documentation, and mentoring to ensure resilient, high-performance virtualization services.

Top Skills: .Net 8Amd EpycAnsibleC#Ci/Cd PipelinesDesired State Configuration (Dsc)DiskspdElbenchoEnterprise AiFioGpuIntel XeonKvmLoad BalancingMicrosoft Hyper-VNvmePlatform ApisPowershellQualysSccmScomSdnStorage Spaces Direct (S2D)Switch Embedded Teaming (Set)VlansVmfleetVmmVmware EsxiVswitchWsfc

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Judgment Labs

Senior Infrastructure Engineer

Judgment Labs San Francisco, California, USA Office

Similar Jobs

Infrastructure Engineer

Senior Software Engineer

Infrastructure Engineer

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech