TwelveLabs Jobs

Staff Site Reliability Engineer

TwelveLabs

Staff Site Reliability Engineer

Reposted 18 Days Ago

Hybrid

San Francisco, CA, USA

220K-250K Annually

Senior level

Hybrid

San Francisco, CA, USA

220K-250K Annually

Senior level

Design and build scalable infrastructure for an AI SaaS platform, focusing on multi-tenant architectures, CI/CD pipelines, and cloud optimization.

The summary above was generated by AI

Who We Are:

At TwelveLabs, we are pioneering the development of cutting-edge multimodal foundation models that have the ability to comprehend videos just like humans do. Our models have redefined the standards in video-language modeling, empowering us with more intuitive and far-reaching capabilities, and fundamentally transforming the way we interact with and analyze various forms of media.

With a remarkable $107 million in Seed and Series A funding, our company is backed by top-tier venture capital firms such as NVIDIA’s NVentures, NEA, Radical Ventures, and Index Ventures, and prominent AI visionaries and founders such as Fei-Fei Li, Silvio Savarese, Alexandr Wang, and more. Headquartered in San Francisco, with an influential APAC presence in Seoul, our global footprint underscores our commitment to driving worldwide innovation.

We are a global company that values the uniqueness of each person’s journey. It is the differences in our cultural, educational, and life experiences that allow us to constantly challenge the status quo. We are looking for individuals who are motivated by our mission and eager to make an impact as we push the bounds of technology to transform the world. Join us as we revolutionize video understanding and multimodal AI.

About the Role

As a Staff Site Reliability Engineer at Twelve Labs, you will own the reliability, scalability, and operability of the infrastructure that powers our multimodal foundation models. You'll be hands-on — building systems when needed, but with a primary focus on ensuring production stays healthy, observable, and resilient.

You'll work most closely with the product teams in the US, supporting the infrastructure behind our core AI products. This role requires deep operational instincts, strong debugging skills, and the ability to balance long-term reliability investments against the pace of an early-stage AI company.

In this role, you will

Own production reliability end to end — from deployment through monitoring, incident response, and postmortem-driven improvement.
Partner with the product engineering teams to ensure their services are reliable, observable, and operable by design.
Build and maintain observability systems (metrics, logging, tracing, alerting) that give the team clear signal on system health and performance.
Design and operate cloud infrastructure supporting AI/ML workloads.
Drive incident response — detect, diagnose, mitigate, and prevent production issues. Build the runbooks, automation, and guardrails that reduce mean time to recovery.
Identify and eliminate toil through automation, self-healing systems, and better tooling.

You may be a good fit if you have:

7+ years of experience operating production infrastructure systems, not just building them.
Strong hands-on experience with AWS, Kubernetes in production environments.
Solid fundamentals in OS internals, networking, storage, and compute — the kind that help you debug a problem at 3am without documentation.
Deep practical experience with observability (Prometheus/Grafana/Loki or equivalent), Infrastructure as Code (Terraform, Ansible), and CI/CD.
Track record of owning services end to end — deployment, monitoring, incident response, and postmortem follow-through.

Interviews

All virtual interviews will be conducted via video. To support identity verification and interview integrity, candidates may be asked to present a government-issued ID. Candidates may also be requested to disable video filters and use a clear, unobstructed background to facilitate effective communication.

Benefits and Perks

🤝 An open and inclusive culture and work environment

🚀 Work closely with a collaborative, mission-driven team on cutting-edge AI technology

🏥 Full health, dental, and vision benefits

🌴 Extremely flexible PTO and parental leave policy. Office closed the week of Christmas and New Years

💪 Monthly wellness stipend

📚 Annual Learning & Development stipend to invest in your growth

💼 Global offices in San Francisco and Seoul, and coworking office memberships for remote team members

🛂 VISA support where applicable

🚆 Transportation stipend

🍲 Daily lunch & dinner provided

55 Green St, San Francisco, California, United States, 94111

Similar Jobs

Domino Data Lab

Site Reliability Engineer

5 Days Ago

Easy Apply

Remote or Hybrid

Easy Apply

200K-230K Annually

Senior level

200K-230K Annually

Senior level

Artificial Intelligence • Machine Learning

Lead development of AI-assisted reliability tooling, own incident response end-to-end, improve observability and SLO/SLI frameworks, scale single-tenant SaaS operations, mentor engineers, and reduce recurring operational toil through engineering and automation.

Top Skills: Cloud PlatformsGoKubernetesLinuxLlm/Ai ToolingLogs And TracingObservability ToolingPythonSlo/Sli Frameworks

ServiceNow

Site Reliability Engineer

9 Days Ago

Remote or Hybrid

Santa Clara, CA, USA

166K-290K Annually

Senior level

166K-290K Annually

Senior level

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation

The Sr Staff Site Reliability Engineer will lead infrastructure projects, design scalable solutions, and collaborate across teams while providing technical support and mentorship.

Top Skills: AWSBashDatadogGitopsGoGrafanaHelmKubernetesLinuxPrometheusPythonTerraform

Sprinter Health

Site Reliability Engineer

25 Days Ago

Remote or Hybrid

160K-255K Annually

Senior level

160K-255K Annually

Senior level

Artificial Intelligence • Healthtech • Logistics • Social Impact • Software • Telehealth

The Staff Site Reliability Engineer at Sprinter Health will enhance the reliability and security of cloud infrastructure, automate processes, and improve system observability across healthcare delivery operations.

Top Skills: Access ManagementAWSBashCi/Cd SystemsCloud NetworkingContainer SystemsGCPIdentity ManagementLogging PlatformsMonitoring PlatformsObservability PlatformsPythonSecrets ManagementTerraformTypescript

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

TwelveLabs

Staff Site Reliability Engineer

TwelveLabs San Francisco, California, USA Office

Similar Jobs

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech