Latent Logo

Latent

Site Reliability Engineer

Reposted 7 Days Ago
Be an Early Applicant
In-Office
San Francisco, CA, USA
200K-275K Annually
Senior level
In-Office
San Francisco, CA, USA
200K-275K Annually
Senior level
As a Site Reliability Engineer, you will manage the production environment, focusing on infrastructure design, automation, and optimizing deployment pipelines to ensure high availability.
The summary above was generated by AI
SRE

Location: San Francisco, CA (5 Days In-Office)

You are the infrastructure expert who enables our rapid product development and guarantees 99.9%+ stability and performance of our clinical AI platform for major health systems. Your focus on operational excellence is directly tied to a patient's access to life-saving treatment.

What We Look for in a Great Engineer

You have the intensity and technical mastery to own mission-critical infrastructure. You hold yourself and others to high standards and thrive in a high-energy, in-office culture where everyone is in it to win it.

  • Tool Proficiency: You are highly proficient with your tools—you speak command line fluently and have mastered keyboard shortcuts.

  • Ownership: You thrive on owning complex systems and have a proven track record of scaling mission-critical deployments.

  • Automation Drive: You love automating things, always finding new ways to increase your own leverage, and defining standards for operational excellence.

  • Problem Solver: You won't wait for someone else to solve a problem that you're in a position to solve; you are willing to jump into whatever needs to get done.

What You'll Work On (Responsibilities)

As our SRE, you will own the entire production environment and improve the development experience:

  • Infrastructure Ownership: Design, implement, and maintain the production environment, having previously handled 500+ machine deployments.

  • Kubernetes Mastery: Own our containerized infrastructure, leveraging deep expertise in Kubernetes and Helm to manage deployment, scaling, and operational health.

  • CI/CD & Deployment Optimization: Optimize and streamline both the TypeScript and Python/ML deployment pipelines to support high-velocity feature release while maintaining the highest reliability.

  • DevX Support: Support Developer Experience (DevX) work to streamline developer workflows, enhance tool proficiency, and improve CI/CD systems.

  • Infrastructure as Code (IaC): Manage and maintain infrastructure definitions using Terraform.

Technical Qualifications & Environment

  • IaC & Orchestration: Deep, demonstrable experience with Kubernetes, Helm, and Terraform.

  • Scaling Systems: Proven ability to architect and maintain complex, distributed systems with high-availability requirements.

  • Deployment Experience: Hands-on experience optimizing deployment pipelines for both application code (TypeScript) and machine learning models (Python/ML). Also PostgreSQL, Redis, Kakfa.

  • Core Team Member: Excitement about working five days per week in our San Francisco office.

HQ

Latent San Francisco, California, USA Office

San Francisco, California, United States, 94103

Similar Jobs

7 Days Ago
Easy Apply
Remote or Hybrid
US
Easy Apply
200K-230K Annually
Senior level
200K-230K Annually
Senior level
Artificial Intelligence • Machine Learning
Lead development of AI-assisted reliability tooling, own incident response end-to-end, improve observability and SLO/SLI frameworks, scale single-tenant SaaS operations, mentor engineers, and reduce recurring operational toil through engineering and automation.
Top Skills: Cloud PlatformsGoKubernetesLinuxLlm/Ai ToolingLogs And TracingObservability ToolingPythonSlo/Sli Frameworks
11 Days Ago
In-Office
Palo Alto, CA, USA
233K-336K Annually
Expert/Leader
233K-336K Annually
Expert/Leader
Artificial Intelligence • Machine Learning
Lead platform reliability and automation at scale by building production Go services, Kubernetes operators, multi-cloud infrastructure, and self-service tooling. Provide technical leadership through architecture, code, on-call escalation ownership, incident remediation, and mentorship to elevate engineering teams' operational maturity.
Top Skills: AWSAzureController-RuntimeGCPGoKubernetesKubernetes OperatorTerraform
Yesterday
Hybrid
Sunnyvale, CA, USA
140K-215K Annually
Expert/Leader
140K-215K Annually
Expert/Leader
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Lead and manage an SRE/Platform engineering team to ensure reliability, scalability, and performance of CrowdStrike's cloud-native security platform. Provide technical leadership, incident command, SLO-driven reliability, capacity planning, automation, and mentorship while collaborating with cross-functional teams.
Top Skills: Apache FlinkApache KafkaAWSAzureElkGCPGoGrafanaIstioJaegerKubernetesLinkerdOpentelemetryPrometheusSplunk

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account