Latent Jobs

Site Reliability Engineer

Latent

Site Reliability Engineer

Reposted 7 Days Ago

Be an Early Applicant

In-Office

San Francisco, CA, USA

200K-275K Annually

Senior level

In-Office

San Francisco, CA, USA

200K-275K Annually

Senior level

As a Site Reliability Engineer, you will manage the production environment, focusing on infrastructure design, automation, and optimizing deployment pipelines to ensure high availability.

The summary above was generated by AI

SRE

Location: San Francisco, CA (5 Days In-Office)

You are the infrastructure expert who enables our rapid product development and guarantees 99.9%+ stability and performance of our clinical AI platform for major health systems. Your focus on operational excellence is directly tied to a patient's access to life-saving treatment.

What We Look for in a Great Engineer

You have the intensity and technical mastery to own mission-critical infrastructure. You hold yourself and others to high standards and thrive in a high-energy, in-office culture where everyone is in it to win it.

Tool Proficiency: You are highly proficient with your tools—you speak command line fluently and have mastered keyboard shortcuts.
Ownership: You thrive on owning complex systems and have a proven track record of scaling mission-critical deployments.
Automation Drive: You love automating things, always finding new ways to increase your own leverage, and defining standards for operational excellence.
Problem Solver: You won't wait for someone else to solve a problem that you're in a position to solve; you are willing to jump into whatever needs to get done.

What You'll Work On (Responsibilities)

As our SRE, you will own the entire production environment and improve the development experience:

Infrastructure Ownership: Design, implement, and maintain the production environment, having previously handled 500+ machine deployments.
Kubernetes Mastery: Own our containerized infrastructure, leveraging deep expertise in Kubernetes and Helm to manage deployment, scaling, and operational health.
CI/CD & Deployment Optimization: Optimize and streamline both the TypeScript and Python/ML deployment pipelines to support high-velocity feature release while maintaining the highest reliability.
DevX Support: Support Developer Experience (DevX) work to streamline developer workflows, enhance tool proficiency, and improve CI/CD systems.
Infrastructure as Code (IaC): Manage and maintain infrastructure definitions using Terraform.

Technical Qualifications & Environment

IaC & Orchestration: Deep, demonstrable experience with Kubernetes, Helm, and Terraform.
Scaling Systems: Proven ability to architect and maintain complex, distributed systems with high-availability requirements.
Deployment Experience: Hands-on experience optimizing deployment pipelines for both application code (TypeScript) and machine learning models (Python/ML). Also PostgreSQL, Redis, Kakfa.
Core Team Member: Excitement about working five days per week in our San Francisco office.

San Francisco, California, United States, 94103

Similar Jobs

Domino Data Lab

Site Reliability Engineer

7 Days Ago

Easy Apply

Remote or Hybrid

Easy Apply

200K-230K Annually

Senior level

200K-230K Annually

Senior level

Artificial Intelligence • Machine Learning

Lead development of AI-assisted reliability tooling, own incident response end-to-end, improve observability and SLO/SLI frameworks, scale single-tenant SaaS operations, mentor engineers, and reduce recurring operational toil through engineering and automation.

Top Skills: Cloud PlatformsGoKubernetesLinuxLlm/Ai ToolingLogs And TracingObservability ToolingPythonSlo/Sli Frameworks

Uniphore

Site Reliability Engineer

11 Days Ago

In-Office

Palo Alto, CA, USA

233K-336K Annually

Expert/Leader

233K-336K Annually

Expert/Leader

Artificial Intelligence • Machine Learning

Lead platform reliability and automation at scale by building production Go services, Kubernetes operators, multi-cloud infrastructure, and self-service tooling. Provide technical leadership through architecture, code, on-call escalation ownership, incident remediation, and mentorship to elevate engineering teams' operational maturity.

Top Skills: AWSAzureController-RuntimeGCPGoKubernetesKubernetes OperatorTerraform

CrowdStrike

Site Reliability Engineer

Yesterday

Hybrid

Sunnyvale, CA, USA

140K-215K Annually

Expert/Leader

140K-215K Annually

Expert/Leader

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity

Lead and manage an SRE/Platform engineering team to ensure reliability, scalability, and performance of CrowdStrike's cloud-native security platform. Provide technical leadership, incident command, SLO-driven reliability, capacity planning, automation, and mentorship while collaborating with cross-functional teams.

Top Skills: Apache FlinkApache KafkaAWSAzureElkGCPGoGrafanaIstioJaegerKubernetesLinkerdOpentelemetryPrometheusSplunk

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Latent

Site Reliability Engineer

Latent San Francisco, California, USA Office

Similar Jobs

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech