Job Title, Company or Keyword

Maximum of 25 job preferences reached.

Top Senior Site Reliability Engineer Jobs in San Francisco, CA

MongoDB

Senior Site Reliability Engineer, Fleet Management

Reposted 18 Days AgoSaved

Easy Apply

Remote or Hybrid

San Francisco Bay Area, CA

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

Develop and maintain Kubernetes runtime environments, support developers, resolve critical issues, and participate in on-call rotations for production systems.

Top Skills: AWSAzureCert-ManagerCorednsCrdsCriCsiGatekeeperGCPGoHelmKubernetesKustomizeOperatorsPythonTerraform

Vizcom

Senior Platform & Reliability Engineer (SRE)

Reposted 13 Days AgoSaved

Hybrid

San Francisco Bay Area, CA

Senior level

Artificial Intelligence • Information Technology • Software

Lead end-to-end platform reliability: define SLIs/SLOs, harden production architecture, ensure Kubernetes runtime and queue safety, run incident command for Sev1/Sev2, own observability/on-call/runbooks, and gate risky releases while delivering a prioritized reliability roadmap.

Top Skills: BullmqKoaKubernetesNode.jsPostgraphilePostgresReactRedisTypescript

Gamma (gamma.app)

Site Reliability Engineer

Reposted 14 Days AgoSaved

In-Office

San Francisco Bay Area, CA

230K-310K Annually

Senior level

230K-310K Annually

Senior level

Artificial Intelligence • Software

Own the reliability and performance of backend systems at Gamma, building automation and tooling while leading incident response and improving system stability.

Top Skills: AWSCloudFormationDockerGoKafkaKubernetesNode.jsPythonTerraformTypescript

Unify (unifygtm.com)

Staff Site Reliability Engineer, Tech Lead

Reposted 14 Days AgoSaved

Remote or Hybrid

San Francisco Bay Area, CA

250K-295K Annually

Senior level

250K-295K Annually

Senior level

Artificial Intelligence • Software

As Staff SRE Tech Lead, you'll oversee platform reliability and scalability, lead the SRE team, architect data infrastructures, and optimize systems while implementing automation and observability practices.

Top Skills: ClickhouseGoPostgresPythonTypescript

Circle (circle.so)

Senior Site Reliability Engineer

11 Days AgoSaved

Easy Apply

Remote

San Francisco Bay Area, CA

Easy Apply

130K-140K Annually

Senior level

130K-140K Annually

Senior level

Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software

Lead SRE work to keep Circle highly available and performant: respond to incidents, own monitoring/alerting/log management, manage and optimize MySQL/Postgres/ClickHouse/Redis databases, maintain server infrastructure and deployment pipelines, collaborate with engineering teams, and build internal SRE tooling and automation.

Top Skills: AWSClickhouseKubernetesLlm-Based Tools (Copilots)MySQLPostgresRedis

SimSpace

Staff Site Reliability Engineer

Reposted 5 Days AgoSaved

Remote

San Francisco Bay Area, CA

165K-230K Annually

Senior level

165K-230K Annually

Senior level

Information Technology • Security

The Staff Site Reliability Engineer will lead the architecture and security of the SimSpace cyber range platform, focusing on reliability, automation, and observability across diverse deployment environments while mentoring engineers and driving infrastructure initiatives.

Top Skills: ArgocdGithub ActionsGoGrafana TankaJsonnetKubernetesPython

Arista Networks

FedRAMP Site Reliability Engineer (FedSRE) - CloudVision

Reposted 5 Days AgoSaved

Remote

San Francisco Bay Area, CA

101K-161K Annually

Senior level

101K-161K Annually

Senior level

Cloud • Software • Analytics

Join Arista Networks as a Site Reliability Engineer to manage CloudVision service reliability, scalability, and stability in a FedRAMP environment, focusing on areas like architecture, security, and performance optimization.

Top Skills: AnsibleBashGCPGkeGoKubernetesPulumiPython

Baseten

SRE

Reposted 15 Days AgoSaved

Remote or Hybrid

San Francisco Bay Area, CA

165K-330K Annually

Mid level

165K-330K Annually

Mid level

Software

As an AI Support Engineer, you'll manage support requests, resolve user issues, optimize ML models, and contribute to product development.

Top Skills: Tensorrt

Intelliswift

Site Reliability Engineer

Reposted 15 Days AgoSaved

In-Office

San Francisco Bay Area, CA

Mid level

Information Technology • Software • Big Data Analytics

The Site Reliability Engineer will design, analyze, and troubleshoot large-scale distributed systems, focusing on operating systems and performance tuning.

Top Skills: ApacheJava

Sierra

Software Engineer, Site Reliability (SRE)

Reposted 16 Days AgoSaved

In-Office

San Francisco Bay Area, CA

230K-390K Annually

Senior level

230K-390K Annually

Senior level

Artificial Intelligence • Software

As a Software Engineer on the Site Reliability team, you'll ensure system reliability, scalability, and observability while partnering with engineering teams and improving incident management processes.

Top Skills: AWSCi/Cd ToolingContainer OrchestrationDatadogGrafanaPrometheusTerraform

Coinbase

Senior Site Reliability Engineer, Workforce Identity

12 Days AgoSaved

Easy Apply

Remote

San Francisco Bay Area, CA

Easy Apply

186K-219K Annually

Senior level

186K-219K Annually

Senior level

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

Own reliability, automation, and DevOps for Coinbase's corporate IAM platform: on-call/incident response, CI/CD and IaC pipelines, identity lifecycle tooling, observability and disaster recovery, documentation, and cross-team IAM advisement to ensure secure, scalable access for a global workforce.

Top Skills: AbacAuth0AWSAzureC#Ci/CdContainer OrchestrationDuoEntraidGCPGenerative AiGitGoIacJavaMfaOktaPingPythonRbacRubySsoTerraform

Coinbase

Senior Site Reliability Engineer, Core AI Infrastructure

12 Days AgoSaved

Easy Apply

Remote

San Francisco Bay Area, CA

Easy Apply

186K-219K Annually

Senior level

186K-219K Annually

Senior level

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

Senior SRE on the IT Operations team owning reliability, monitoring, and incident response for AI infrastructure. Build automation, CI/CD and Kubernetes tooling, improve observability and documentation, and develop internal full-stack tools using Go or Python. Partner with Infrastructure, Security, and Compliance to scale secure, resilient AI deployment pipelines.

Top Skills: AnsibleAWSBashChefCi/CdDockerEc2GitGoKubernetesLinuxPuppetPythonRubySaltTerraform

New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free

Cerebras Systems

Staff Site Reliability Engineer – Automation and Platform

Reposted 7 Days AgoSaved

In-Office or Remote

San Francisco Bay Area, CA

Senior level

Artificial Intelligence

The Deployment Engineer will build and operate AI inference clusters, ensure scalable deployments, optimize allocation, and maintain infrastructure. Responsibilities include software updates, telemetry development, and collaborative improvements with teams.

Top Skills: DockerGrafanaInfluxdbK8SLinuxPrometheusPython

CentralSquare Technologies

Lead Site Reliability Engineer - Remote

Reposted 7 Days AgoSaved

Remote

San Francisco Bay Area, CA

Senior level

Software

The role involves designing, building, and maintaining AWS infrastructure, implementing IaC, developing CI/CD pipelines, automating operations, and enhancing network and security practices.

Top Skills: AWSBashCi/CdCloudFormationDockerKubernetesPowershellPythonTerraform

Zocdoc

Senior Site Reliability Engineer

Reposted 13 Days AgoSaved

Easy Apply

Remote or Hybrid

San Francisco Bay Area, CA

Easy Apply

180K-220K Annually

Senior level

180K-220K Annually

Senior level

Healthtech • Information Technology • Software • Telehealth

The Senior Site Reliability Engineer will develop, monitor, and maintain distributed production systems, ensuring uptime for patients and providers while automating processes and supporting a large engineering team.

Top Skills: AWSDockerGCPKubernetes

Zoox

Staff Software Engineer - SRE, GitHub & CI/CD Infrastructure

18 Days AgoSaved

Hybrid

San Francisco Bay Area, CA

250K-300K Annually

Senior level

250K-300K Annually

Senior level

Artificial Intelligence • Machine Learning • Robotics • Software • Transportation • Design • Manufacturing

The Staff Site Reliability Engineer will lead source control strategy, manage Git-based monorepo operations, improve developer productivity, and oversee migrations to GitHub Cloud.

Top Skills: BazelBuckBuildkiteGerritGithub ActionsGithub CloudGithub EnterpriseGitlab CiJenkinsPulumiReviewableTerraform

Socure

Senior Software Engineer - SRE

18 Days AgoSaved

Remote or Hybrid

San Francisco Bay Area, CA

160K-180K Annually

Senior level

160K-180K Annually

Senior level

Artificial Intelligence • Machine Learning • Software • Analytics

The role involves end-to-end ownership of AWS infrastructure, managing Kubernetes platforms, and ensuring system reliability through observability and automation. Responsibilities include incident response and maintaining CI/CD systems.

Top Skills: ArgocdAWSDatadogGitGoKubernetesPythonTerraform

CoverMyMeds

Sr. Database Site Reliability Engineer (DB SRE)

Reposted 8 Days AgoSaved

In-Office or Remote

San Francisco Bay Area, CA

132K-221K Annually

Senior level

132K-221K Annually

Senior level

Healthtech • Information Technology • Software

The Sr. Database Site Reliability Engineer manages the reliability and performance of Azure PostgreSQL platforms, applying SRE principles for automation and observability. Responsibilities include incident response, backup strategies, and ensuring compliance with security standards.

Top Skills: ArgocdAzure PostgresqlCi/CdDatadogGitHelmKubernetesTerraform

Xpert Development LLC

Senior DevOps & Site Reliability Engineer

9 Days AgoSaved

Remote

San Francisco Bay Area, CA

165K-190K Annually

Senior level

165K-190K Annually

Senior level

Artificial Intelligence • Information Technology • Software • Automation

Own US PST coverage for releases and incidents as the first SRE; bridge infrastructure and code by working with Kubernetes, Terraform, and AWS and patching Elixir when needed; lead incident response and post-mortems; define SLOs and observability; author runbooks and support HIPAA-aligned compliance for a regulated medical-device platform.

Top Skills: AWSElixirKubernetesTerraform

Andromeda (andromeda.ai)

Site Reliability Engineer - AI Infrastructure

Reposted 18 Days AgoSaved

In-Office or Remote

San Francisco Bay Area, CA

Senior level

Artificial Intelligence • Cloud • Information Technology • Software

The Site Reliability Engineer will provision and manage Kubernetes clusters, build automation tools, debug customer issues, and improve infrastructure reliability.

Top Skills: AnsibleBashDatadogGoGrafanaHelmKubernetesLokiPrometheusPythonTerraform

TwelveLabs

Staff Site Reliability Engineer

Reposted 18 Days AgoSaved

Hybrid

San Francisco Bay Area, CA

220K-250K Annually

Senior level

220K-250K Annually

Senior level

Software

Design and build scalable infrastructure for an AI SaaS platform, focusing on multi-tenant architectures, CI/CD pipelines, and cloud optimization.

Top Skills: AnsibleAWSAzureGCPGoKubernetesPythonTerraformTypescript

Cooley

Senior Technology Site Reliability Engineer

Reposted 18 Days AgoSaved

In-Office or Remote

San Francisco Bay Area, CA

140K-205K Annually

Senior level

140K-205K Annually

Senior level

Information Technology • Legal Tech

The Senior Technology Site Reliability Engineer is responsible for maintaining and optimizing infrastructure and applications, ensuring reliability and performance while automating processes and collaborating with teams.

Top Skills: AWSChefDatadogGoGrafanaJavaPrometheusPuppetPythonSaltTerraform

Genentech

Principal Site Reliability Engineer (Intelligent Automation)

Reposted 18 Days AgoSaved

In-Office

San Francisco Bay Area, CA

163K-302K Annually

Senior level

163K-302K Annually

Senior level

Healthtech • Biotech

The role involves architecting and implementing Infrastructure as Code (IaC) solutions for ML and HPC workloads, ensuring global availability, automating processes, leading technical teams, and optimizing costs while maintaining compliance.

Top Skills: AWSAzureBashCloudFormationDatadogElk StackGCPGoGrafanaNvidia CudaPrometheusPythonSpaceliftTensorFlowTerraform

Nebius

Site Reliability Engineer

Reposted 9 Days AgoSaved

Remote

San Francisco Bay Area, CA

100K-140K Annually

Mid level

100K-140K Annually

Mid level

Artificial Intelligence • Information Technology • Consulting

The Linux Systems Administrator will maintain and troubleshoot Linux systems, support network services, and work on systems integration while collaborating with infrastructure teams.

Top Skills: DhcpDnsLinuxNtpPython

Strike (simplistic.com)

Site Reliability Engineer

Reposted 9 Days AgoSaved

Remote

San Francisco Bay Area, CA

Senior level

Information Technology • Cryptocurrency

The Site Reliability Engineer will lead technical initiatives, architect solutions, troubleshoot issues, mentor team members, and improve observability practices.

Top Skills: ArgocdBashElk StackGCPGoGrafanaHelmKubernetesPrometheusPythonTerraform