Guild.ai Jobs

Engineer, Production Engineering

Guild.ai

Engineer, Production Engineering

Posted 4 Days Ago

Be an Early Applicant

In-Office

San Francisco, CA, USA

Senior level

In-Office

San Francisco, CA, USA

Senior level

Own production infrastructure, security, and compliance for an AI-agent platform: manage Kubernetes on GCP, customer VPC deployments across clouds, observability, SOC2 readiness, pentest/bug-bounty coordination, IT identity, and automated CI/CD and progressive delivery to ensure secure, reliable production at scale.

The summary above was generated by AI

Engineer — Production Engineering

Location: San Francisco Bay Area (Hybrid/Onsite)
Type: Full-time
Stage: Early-stage startup

About the Role

We are building the control plane for AI agents in teams and companies.

As a Production Engineer, you will own the infrastructure, security, and compliance systems that allow our platform to ship fast and run reliably at scale. This is not a traditional ops role — you will write real code, contribute directly to the product, and own the full security and compliance surface of an early-stage company.

You'll work across Kubernetes infrastructure, cloud delivery, agent sandboxing, SOC2 compliance, IT systems, and production observability — and you'll contribute to the product itself, building security-sensitive features and auditing application code for vulnerabilities.

If you want to own the production backbone for the agent-native era — from a Terraform module to a pentest to an API key implementation — we want to talk.

What You'll Own

1. Cloud & Kubernetes Infrastructure

Our Stack: Manage and evolve our production and staging infrastructure on GCP (GKE) using Terraform. Own DNS, networking, and environment configuration end-to-end.
Customer Environments: Deploy and operate within customer VPCs across AWS, Azure, and GCP — adapting to varied infrastructure constraints, security requirements, and enterprise networking configurations.
Agent Sandboxing: Build and maintain Kubernetes-based sandboxing for agent execution — ensuring agents operate within strict network boundaries and must route through our API gateway rather than having unfettered internet access.
Observability: Own our observability stack, including OpenTelemetry instrumentation and integrations with New Relic and Splunk, to give the team deep visibility into system performance and agent runtime behavior.

2. Security, Compliance & IT

SOC2 & Audits: Lead infrastructure and operational work to support SOC2 compliance, including audit preparation, evidence collection, and control implementation.
Penetration Testing & Bug Bounty: Manage our HackerOne engagement — coordinating pentests, triaging incoming bug bounty reports, and driving remediation.
Product Security: Audit application code for security vulnerabilities, contribute security-sensitive product features (e.g., API key management), and ensure product and infrastructure security are coherent end-to-end.
IT & Identity: Own our IT stack — Okta, device management, and access controls — keeping the company secure as we scale.

3. CI/CD & Progressive Delivery

Deployment Pipelines: Design and maintain safe, automated CI/CD workflows supporting rollout strategies like canary and blue-green deployments.
Release Velocity: Make shipping to production a routine, boring, highly automated non-event.

What We're Looking For

Strong Fit

Experience: 5+ years in Production Engineering, Platform Engineering, or a security-focused infrastructure role, ideally at a fast-growing startup or SaaS company.
Our Stack: Strong hands-on experience with Kubernetes and GCP in production; comfortable with Terraform for managing real infrastructure.
Code over Click: Strong programming skills (Python, Go, TypeScript, etc.) with a passion for automating away toil.
Security Depth: Hands-on experience with compliance frameworks (SOC2), vulnerability management, and secure system design.

Bonus Points

Background with multi-tenant SaaS or enterprise security and procurement requirements.
Exposure to AI/ML infrastructure, particularly agent runtimes.
Experience building security-sensitive product features alongside infrastructure work.
Experience supporting pentests / bug bounties
Experience deploying and operating in customer VPCs or other external cloud environments across AWS, Azure, and/or GCP — navigating enterprise networking, security, and access constraints.

Why This Role is Unique

Broad Ownership: You'll own the full security and compliance surface of an early-stage company — from SOC2 to sandboxed agent execution to IT — while also contributing directly to the product.
Agent Infrastructure: You'll design infrastructure for autonomous AI agents, not just traditional web services — introducing unique sandboxing, observability, and security challenges.
Our Infra and Theirs: You'll operate across both our own production environment and customer cloud environments, requiring you to be fluent across AWS, Azure, and GCP.
High Autonomy: As an early hire, you'll have a seat at the table to choose the tools and define the architecture that carries us to scale.

Who Thrives Here

Engineers who are as comfortable reading application code for vulnerabilities as they are writing a Terraform module.
People who enjoy owning the full security and compliance surface, not just one layer of it.
Builders who can navigate the constraints of customer enterprise environments without losing velocity.
Those who are energized — not overwhelmed — by the breadth of an early-stage technical operations role.

San Francisco, California, United States

9 Woodside Way, Ross, California, United States, 94957 9698

Similar Jobs

NVIDIA

Senior Software Engineer

23 Days Ago

In-Office or Remote

Santa Clara, CA, USA

184K-357K Annually

Senior level

184K-357K Annually

Senior level

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse

The role involves building automation and tooling for GPU infrastructure, improving workflows, and collaborating with teams for reliable cluster operations.

Top Skills: ArgocdCloud InfrastructureGitopsGoKubernetesLinuxPythonTerraform

NVIDIA

Principal Software Engineer

5 Days Ago

In-Office or Remote

Santa Clara, CA, USA

272K-431K Annually

Expert/Leader

272K-431K Annually

Expert/Leader

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse

Lead the technical direction for production engineering in NVIDIA DGX Cloud, focusing on Kubernetes operations, automation, and reliability for GPU clusters.

Top Skills: Ai/Ml InfrastructureGitopsGoInfrastructure AutomationKubernetesLinuxPython

NVIDIA

Senior Software Engineer

5 Days Ago

In-Office or Remote

Santa Clara, CA, USA

184K-357K Annually

Senior level

184K-357K Annually

Senior level

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse

Design, build, and automate large-scale GPU clusters while improving operational workflows, collaborating across teams, and handling incident responses.

Top Skills: ArgocdCloud InfrastructureContainersGitopsGoInfrastructure AutomationKubernetesLinuxPythonTerraform

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Guild.ai

Engineer, Production Engineering

Guild.ai San Francisco, California, USA Office

Guild.ai Ross, California, USA Office

Similar Jobs

Senior Software Engineer

Principal Software Engineer

Senior Software Engineer

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech