Top Senior Site Reliability Engineer Jobs in San Francisco, CA

Reposted 18 Days AgoSaved
Easy Apply
Remote or Hybrid
San Francisco Bay Area, CA
Easy Apply
127K-249K Annually
Senior level
127K-249K Annually
Senior level
Big Data • Cloud • Software • Database
Develop and maintain Kubernetes runtime environments, support developers, resolve critical issues, and participate in on-call rotations for production systems.
Top Skills: AWSAzureCert-ManagerCorednsCrdsCriCsiGatekeeperGCPGoHelmKubernetesKustomizeOperatorsPythonTerraform
Reposted 13 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
Senior level
Senior level
Artificial Intelligence • Information Technology • Software
Lead end-to-end platform reliability: define SLIs/SLOs, harden production architecture, ensure Kubernetes runtime and queue safety, run incident command for Sev1/Sev2, own observability/on-call/runbooks, and gate risky releases while delivering a prioritized reliability roadmap.
Top Skills: BullmqKoaKubernetesNode.jsPostgraphilePostgresReactRedisTypescript
Reposted 14 Days AgoSaved
In-Office
San Francisco Bay Area, CA
230K-310K Annually
Senior level
230K-310K Annually
Senior level
Artificial Intelligence • Software
Own the reliability and performance of backend systems at Gamma, building automation and tooling while leading incident response and improving system stability.
Top Skills: AWSCloudFormationDockerGoKafkaKubernetesNode.jsPythonTerraformTypescript
Reposted 14 Days AgoSaved
Remote or Hybrid
San Francisco Bay Area, CA
250K-295K Annually
Senior level
250K-295K Annually
Senior level
Artificial Intelligence • Software
As Staff SRE Tech Lead, you'll oversee platform reliability and scalability, lead the SRE team, architect data infrastructures, and optimize systems while implementing automation and observability practices.
Top Skills: ClickhouseGoPostgresPythonTypescript
11 Days AgoSaved
Easy Apply
Remote
San Francisco Bay Area, CA
Easy Apply
130K-140K Annually
Senior level
130K-140K Annually
Senior level
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Lead SRE work to keep Circle highly available and performant: respond to incidents, own monitoring/alerting/log management, manage and optimize MySQL/Postgres/ClickHouse/Redis databases, maintain server infrastructure and deployment pipelines, collaborate with engineering teams, and build internal SRE tooling and automation.
Top Skills: AWSClickhouseKubernetesLlm-Based Tools (Copilots)MySQLPostgresRedis
Reposted 5 Days AgoSaved
Remote
San Francisco Bay Area, CA
165K-230K Annually
Senior level
165K-230K Annually
Senior level
Information Technology • Security
The Staff Site Reliability Engineer will lead the architecture and security of the SimSpace cyber range platform, focusing on reliability, automation, and observability across diverse deployment environments while mentoring engineers and driving infrastructure initiatives.
Top Skills: ArgocdGithub ActionsGoGrafana TankaJsonnetKubernetesPython
Reposted 5 Days AgoSaved
Remote
San Francisco Bay Area, CA
101K-161K Annually
Senior level
101K-161K Annually
Senior level
Cloud • Software • Analytics
Join Arista Networks as a Site Reliability Engineer to manage CloudVision service reliability, scalability, and stability in a FedRAMP environment, focusing on areas like architecture, security, and performance optimization.
Top Skills: AnsibleBashGCPGkeGoKubernetesPulumiPython
Reposted 15 Days AgoSaved
Remote or Hybrid
San Francisco Bay Area, CA
165K-330K Annually
Mid level
165K-330K Annually
Mid level
Software
As an AI Support Engineer, you'll manage support requests, resolve user issues, optimize ML models, and contribute to product development.
Top Skills: Tensorrt
Reposted 15 Days AgoSaved
In-Office
San Francisco Bay Area, CA
Mid level
Mid level
Information Technology • Software • Big Data Analytics
The Site Reliability Engineer will design, analyze, and troubleshoot large-scale distributed systems, focusing on operating systems and performance tuning.
Top Skills: ApacheJava
Reposted 16 Days AgoSaved
In-Office
San Francisco Bay Area, CA
230K-390K Annually
Senior level
230K-390K Annually
Senior level
Artificial Intelligence • Software
As a Software Engineer on the Site Reliability team, you'll ensure system reliability, scalability, and observability while partnering with engineering teams and improving incident management processes.
Top Skills: AWSCi/Cd ToolingContainer OrchestrationDatadogGrafanaPrometheusTerraform
12 Days AgoSaved
Easy Apply
Remote
San Francisco Bay Area, CA
Easy Apply
186K-219K Annually
Senior level
186K-219K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Own reliability, automation, and DevOps for Coinbase's corporate IAM platform: on-call/incident response, CI/CD and IaC pipelines, identity lifecycle tooling, observability and disaster recovery, documentation, and cross-team IAM advisement to ensure secure, scalable access for a global workforce.
Top Skills: AbacAuth0AWSAzureC#Ci/CdContainer OrchestrationDuoEntraidGCPGenerative AiGitGoIacJavaMfaOktaPingPythonRbacRubySsoTerraform
12 Days AgoSaved
Easy Apply
Remote
San Francisco Bay Area, CA
Easy Apply
186K-219K Annually
Senior level
186K-219K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Senior SRE on the IT Operations team owning reliability, monitoring, and incident response for AI infrastructure. Build automation, CI/CD and Kubernetes tooling, improve observability and documentation, and develop internal full-stack tools using Go or Python. Partner with Infrastructure, Security, and Compliance to scale secure, resilient AI deployment pipelines.
Top Skills: AnsibleAWSBashChefCi/CdDockerEc2GitGoKubernetesLinuxPuppetPythonRubySaltTerraform
New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free
Application Tracker Preview
Reposted 7 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
Senior level
Senior level
Artificial Intelligence
The Deployment Engineer will build and operate AI inference clusters, ensure scalable deployments, optimize allocation, and maintain infrastructure. Responsibilities include software updates, telemetry development, and collaborative improvements with teams.
Top Skills: DockerGrafanaInfluxdbK8SLinuxPrometheusPython
Reposted 7 Days AgoSaved
Remote
San Francisco Bay Area, CA
Senior level
Senior level
Software
The role involves designing, building, and maintaining AWS infrastructure, implementing IaC, developing CI/CD pipelines, automating operations, and enhancing network and security practices.
Top Skills: AWSBashCi/CdCloudFormationDockerKubernetesPowershellPythonTerraform
Reposted 13 Days AgoSaved
Easy Apply
Remote or Hybrid
San Francisco Bay Area, CA
Easy Apply
180K-220K Annually
Senior level
180K-220K Annually
Senior level
Healthtech • Information Technology • Software • Telehealth
The Senior Site Reliability Engineer will develop, monitor, and maintain distributed production systems, ensuring uptime for patients and providers while automating processes and supporting a large engineering team.
Top Skills: AWSDockerGCPKubernetes
18 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
250K-300K Annually
Senior level
250K-300K Annually
Senior level
Artificial Intelligence • Machine Learning • Robotics • Software • Transportation • Design • Manufacturing
The Staff Site Reliability Engineer will lead source control strategy, manage Git-based monorepo operations, improve developer productivity, and oversee migrations to GitHub Cloud.
Top Skills: BazelBuckBuildkiteGerritGithub ActionsGithub CloudGithub EnterpriseGitlab CiJenkinsPulumiReviewableTerraform
18 Days AgoSaved
Remote or Hybrid
San Francisco Bay Area, CA
160K-180K Annually
Senior level
160K-180K Annually
Senior level
Artificial Intelligence • Machine Learning • Software • Analytics
The role involves end-to-end ownership of AWS infrastructure, managing Kubernetes platforms, and ensuring system reliability through observability and automation. Responsibilities include incident response and maintaining CI/CD systems.
Top Skills: ArgocdAWSDatadogGitGoKubernetesPythonTerraform
Reposted 8 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
132K-221K Annually
Senior level
132K-221K Annually
Senior level
Healthtech • Information Technology • Software
The Sr. Database Site Reliability Engineer manages the reliability and performance of Azure PostgreSQL platforms, applying SRE principles for automation and observability. Responsibilities include incident response, backup strategies, and ensuring compliance with security standards.
Top Skills: ArgocdAzure PostgresqlCi/CdDatadogGitHelmKubernetesTerraform
9 Days AgoSaved
Remote
San Francisco Bay Area, CA
165K-190K Annually
Senior level
165K-190K Annually
Senior level
Artificial Intelligence • Information Technology • Software • Automation
Own US PST coverage for releases and incidents as the first SRE; bridge infrastructure and code by working with Kubernetes, Terraform, and AWS and patching Elixir when needed; lead incident response and post-mortems; define SLOs and observability; author runbooks and support HIPAA-aligned compliance for a regulated medical-device platform.
Top Skills: AWSElixirKubernetesTerraform
Reposted 18 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
Senior level
Senior level
Artificial Intelligence • Cloud • Information Technology • Software
The Site Reliability Engineer will provision and manage Kubernetes clusters, build automation tools, debug customer issues, and improve infrastructure reliability.
Top Skills: AnsibleBashDatadogGoGrafanaHelmKubernetesLokiPrometheusPythonTerraform
Reposted 18 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
220K-250K Annually
Senior level
220K-250K Annually
Senior level
Software
Design and build scalable infrastructure for an AI SaaS platform, focusing on multi-tenant architectures, CI/CD pipelines, and cloud optimization.
Top Skills: AnsibleAWSAzureGCPGoKubernetesPythonTerraformTypescript
Reposted 18 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
140K-205K Annually
Senior level
140K-205K Annually
Senior level
Information Technology • Legal Tech
The Senior Technology Site Reliability Engineer is responsible for maintaining and optimizing infrastructure and applications, ensuring reliability and performance while automating processes and collaborating with teams.
Top Skills: AWSChefDatadogGoGrafanaJavaPrometheusPuppetPythonSaltTerraform
Reposted 18 Days AgoSaved
In-Office
San Francisco Bay Area, CA
163K-302K Annually
Senior level
163K-302K Annually
Senior level
Healthtech • Biotech
The role involves architecting and implementing Infrastructure as Code (IaC) solutions for ML and HPC workloads, ensuring global availability, automating processes, leading technical teams, and optimizing costs while maintaining compliance.
Top Skills: AWSAzureBashCloudFormationDatadogElk StackGCPGoGrafanaNvidia CudaPrometheusPythonSpaceliftTensorFlowTerraform
Reposted 9 Days AgoSaved
Remote
San Francisco Bay Area, CA
100K-140K Annually
Mid level
100K-140K Annually
Mid level
Artificial Intelligence • Information Technology • Consulting
The Linux Systems Administrator will maintain and troubleshoot Linux systems, support network services, and work on systems integration while collaborating with infrastructure teams.
Top Skills: DhcpDnsLinuxNtpPython
Reposted 9 Days AgoSaved
Remote
San Francisco Bay Area, CA
Senior level
Senior level
Information Technology • Cryptocurrency
The Site Reliability Engineer will lead technical initiatives, architect solutions, troubleshoot issues, mentor team members, and improve observability practices.
Top Skills: ArgocdBashElk StackGCPGoGrafanaHelmKubernetesPrometheusPythonTerraform
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account