Top Reliability Engineer Jobs in San Francisco, CA

Reposted 7 Days AgoSaved
Easy Apply
Remote or Hybrid
San Francisco Bay Area, CA
Easy Apply
Internship
Internship
Cloud • Information Technology • Security • Software • Cybersecurity
This internship role focuses on SRE skills, requiring collaboration and problem-solving in dynamic environments for Zscaler's Zero Trust Exchange team.
Top Skills: AnsibleAws EcsKubernetesLinuxPythonTerraform
Reposted 7 Days AgoSaved
Easy Apply
Remote
San Francisco Bay Area, CA
Easy Apply
Mid level
Mid level
Cloud • Security • Software • Cybersecurity • Automation
As a Cloud Cost Utilization SRE at GitLab, you'll manage cloud spending, improve tracking and optimization of cloud usage, and collaborate with finance and engineering teams to enhance cost efficiency across AWS and GCP.
Top Skills: AnsibleAWSElkGCPGrafanaLokiMimirPrometheusTempoTerraform
Reposted 2 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
Senior level
Senior level
Software
Drive reliability testing and qualification of cellular base stations, collaborating with R&D for long-term reliability and product lifecycle support.
Top Skills: ExcelMS OfficeMs WordPtc WindchillPythonTelcordia
Reposted 18 Days AgoSaved
Easy Apply
Remote or Hybrid
San Francisco Bay Area, CA
Easy Apply
127K-249K Annually
Senior level
127K-249K Annually
Senior level
Big Data • Cloud • Software • Database
Develop and maintain Kubernetes runtime environments, support developers, resolve critical issues, and participate in on-call rotations for production systems.
Top Skills: AWSAzureCert-ManagerCorednsCrdsCriCsiGatekeeperGCPGoHelmKubernetesKustomizeOperatorsPythonTerraform
Reposted 13 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
Senior level
Senior level
Artificial Intelligence • Information Technology • Software
Lead end-to-end platform reliability: define SLIs/SLOs, harden production architecture, ensure Kubernetes runtime and queue safety, run incident command for Sev1/Sev2, own observability/on-call/runbooks, and gate risky releases while delivering a prioritized reliability roadmap.
Top Skills: BullmqKoaKubernetesNode.jsPostgraphilePostgresReactRedisTypescript
Reposted 13 Days AgoSaved
In-Office
San Francisco Bay Area, CA
175K-215K Annually
Junior
175K-215K Annually
Junior
Automotive
Software Reliability Engineers at Waymo ensure the stable operation of autonomous systems, collaborating on reliability solutions and system performance improvements.
Top Skills: C++JavaPython
Reposted 13 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
150K-180K Annually
Senior level
150K-180K Annually
Senior level
Aerospace • Automation
The Senior Reliability Engineer will establish AeroVect's reliability engineering practice, leading reliability analyses and managing external testing programs to ensure product durability and performance across operational environments.
Top Skills: Accelerated Life TestingData AnalysisEnvironmental TestingFmeaFtaMtbfMttrRbd
Reposted 4 Days AgoSaved
Remote
San Francisco Bay Area, CA
146K-162K Annually
Senior level
146K-162K Annually
Senior level
Healthtech • Software
The Database Reliability Engineer manages and maintains cloud-based database infrastructures for SaaS applications, focusing on automation, process improvement, and collaboration with engineering teams.
Top Skills: AnsibleAWSAzureAzure Data FactoryC#DatabricksGCPGitGrafanaInfluxdbMySQLPostgresPowershellPythonSQLSQL ServerTerraform
Reposted 4 Days AgoSaved
Remote
San Francisco Bay Area, CA
75K-150K Annually
Senior level
75K-150K Annually
Senior level
Database • Analytics
As a Database Reliability Engineer at ClickHouse, you'll improve reliability, manage escalation processes, support incident response, and enhance database performance while collaborating across teams.
Top Skills: AWSAzureC++ClickhouseGoogle Cloud PlatformPythonShellSQL
5 Days AgoSaved
Remote or Hybrid
San Francisco Bay Area, CA
Senior level
Senior level
Software
Lead reliability engineering for Silicon Photonics hardware: define and validate reliability models, perform MTBF/MTBCF predictions, analyze field data, direct verification testing and root-cause analysis, drive corrective actions, and mentor cross-functional teams to improve product reliability.
Top Skills: Derating AnalysisDfmeaMtbcfMtbfSherlockSilicon PhotonicsTelcordiaThermal DesignWindchill Qs
11 Days AgoSaved
Easy Apply
Remote or Hybrid
San Francisco Bay Area, CA
Easy Apply
227K-272K Annually
Senior level
227K-272K Annually
Senior level
eCommerce • Healthtech • Kids + Family • Retail • Social Media
Own and evolve Babylist's AWS infrastructure and developer platform using Terraform and Kubernetes. Improve CI/CD reliability, support engineers across environments, define monitoring and alerting standards, lead incident response and postmortems, and shape platform architecture to scale for millions of users.
Top Skills: AWSCdnCircleCICronitorDatadogDnsEksGithub ActionsKubernetesLoad BalancersMySQLPagerdutyRdsRedisRuby On RailsSentrySidekiqTerraform
11 Days AgoSaved
Easy Apply
Remote
San Francisco Bay Area, CA
Easy Apply
130K-140K Annually
Senior level
130K-140K Annually
Senior level
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Lead SRE work to keep Circle highly available and performant: respond to incidents, own monitoring/alerting/log management, manage and optimize MySQL/Postgres/ClickHouse/Redis databases, maintain server infrastructure and deployment pipelines, collaborate with engineering teams, and build internal SRE tooling and automation.
Top Skills: AWSClickhouseKubernetesLlm-Based Tools (Copilots)MySQLPostgresRedis
New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free
Application Tracker Preview
6 Days AgoSaved
Remote
San Francisco Bay Area, CA
Mid level
Mid level
Information Technology • Software • Database • Automation
Owner of on-prem reliability and escalations: reproduce and resolve L2/L3 issues across heterogeneous Kubernetes environments, build diagnostics and automation, improve CI and e2e test stability, establish performance baselines, harden install/upgrade flows, and write tooling in Python/Go/Rust to reduce repeat incidents.
Top Skills: BenchmarkingCiCi/CdContainersE2E TestingGoHealth ChecksHelmInstallersIntegration TestingKubernetesLoad GenerationLogsMetricsNetworkingObservabilityPackagingProfilingPythonRbacRustStorageSupport BundlesTraces
Reposted 21 Days AgoSaved
Remote or Hybrid
San Francisco Bay Area, CA
160K-235K Annually
Senior level
160K-235K Annually
Senior level
Artificial Intelligence • Healthtech • Logistics • Social Impact • Software • Telehealth
The Senior Site Reliability Engineer will enhance the reliability and security of infrastructure for in-home healthcare services, using cloud technology and automation to improve systems and processes.
Top Skills: AWSBashGCPPythonTerraformTypescript
Reposted 16 Days AgoSaved
In-Office
San Francisco Bay Area, CA
105K-137K Annually
Senior level
105K-137K Annually
Senior level
Food • Marketing Tech • Manufacturing
The Senior Reliability Engineer enhances equipment reliability, reduces downtime, and improves maintenance strategies across production. This role involves collaboration with engineering and operations, leading reliability programs, and mentoring junior staff.
Top Skills: Advanced AnalyticsCmmsDigital ToolsReliability Modeling Tools
Reposted 16 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
164K-197K Annually
Mid level
164K-197K Annually
Mid level
Artificial Intelligence • Machine Learning • Robotics • Software • Transportation • Design • Manufacturing
The Design Reliability Engineer will establish reliability targets, lead DFMEA processes, develop test plans, and implement monitoring systems for sensors and automotive electronics.
Top Skills: NumpyPandasPysparkPythonScipy
12 Days AgoSaved
Easy Apply
Remote
San Francisco Bay Area, CA
Easy Apply
186K-219K Annually
Senior level
186K-219K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Own reliability, automation, and DevOps for Coinbase's corporate IAM platform: on-call/incident response, CI/CD and IaC pipelines, identity lifecycle tooling, observability and disaster recovery, documentation, and cross-team IAM advisement to ensure secure, scalable access for a global workforce.
Top Skills: AbacAuth0AWSAzureC#Ci/CdContainer OrchestrationDuoEntraidGCPGenerative AiGitGoIacJavaMfaOktaPingPythonRbacRubySsoTerraform
12 Days AgoSaved
Easy Apply
Remote
San Francisco Bay Area, CA
Easy Apply
218K-257K Annually
Senior level
218K-257K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Own reliability, monitoring, and incident response for AI infrastructure; build automation and CI/CD tooling; manage Kubernetes/Docker production workloads; partner with infrastructure, security, and compliance; improve observability and documentation; develop internal full‑stack tooling in Go or Python.
Top Skills: AnsibleAWSBashChefCi/CdDockerEc2GitGoKubernetesLinuxLog AggregationNetwork SecurityPuppetPythonRubySaltTerraform
12 Days AgoSaved
Easy Apply
Remote
San Francisco Bay Area, CA
Easy Apply
186K-219K Annually
Senior level
186K-219K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Senior SRE on the IT Operations team owning reliability, monitoring, and incident response for AI infrastructure. Build automation, CI/CD and Kubernetes tooling, improve observability and documentation, and develop internal full-stack tools using Go or Python. Partner with Infrastructure, Security, and Compliance to scale secure, resilient AI deployment pipelines.
Top Skills: AnsibleAWSBashChefCi/CdDockerEc2GitGoKubernetesLinuxPuppetPythonRubySaltTerraform
Reposted 13 Days AgoSaved
Easy Apply
Remote or Hybrid
San Francisco Bay Area, CA
Easy Apply
180K-220K Annually
Senior level
180K-220K Annually
Senior level
Healthtech • Information Technology • Software • Telehealth
The Senior Site Reliability Engineer will develop, monitor, and maintain distributed production systems, ensuring uptime for patients and providers while automating processes and supporting a large engineering team.
Top Skills: AWSDockerGCPKubernetes
Reposted 14 Days AgoSaved
Easy Apply
Remote or Hybrid
San Francisco Bay Area, CA
Easy Apply
127K-249K Annually
Senior level
127K-249K Annually
Senior level
Big Data • Cloud • Software • Database
As a Senior Site Reliability Engineer, you'll design and build complex systems, support Atlas platform operations, automate processes, and ensure high availability of services.
Top Skills: AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls
25 Days AgoSaved
Remote or Hybrid
San Francisco Bay Area, CA
160K-255K Annually
Senior level
160K-255K Annually
Senior level
Artificial Intelligence • Healthtech • Logistics • Social Impact • Software • Telehealth
The Staff Site Reliability Engineer at Sprinter Health will enhance the reliability and security of cloud infrastructure, automate processes, and improve system observability across healthcare delivery operations.
Top Skills: Access ManagementAWSBashCi/Cd SystemsCloud NetworkingContainer SystemsGCPIdentity ManagementLogging PlatformsMonitoring PlatformsObservability PlatformsPythonSecrets ManagementTerraformTypescript
Reposted 16 Days AgoSaved
Remote or Hybrid
San Francisco Bay Area, CA
190K-235K Annually
Senior level
190K-235K Annually
Senior level
HR Tech • Information Technology • Professional Services • Sales • Software
Own and operate production-grade Kubernetes infrastructure on AWS, build GitOps CI/CD with GitHub Actions and ArgoCD, develop AI agents and internal DevOps tooling, maintain Datadog-based observability, and manage on-call incident response while collaborating with engineering teams to improve reliability and delivery speed.
Top Skills: Ai/LlmArgocdAWSCi/CdDatadogGithub ActionsGitopsGoKubernetesPython
Reposted 20 Days AgoSaved
In-Office
San Francisco Bay Area, CA
160K-220K Annually
Senior level
160K-220K Annually
Senior level
Cloud
The role involves designing, optimizing, and maintaining PostgreSQL and MySQL databases, ensuring high availability, reliability, and performance for mission-critical systems, while automating operational tasks and responding to incidents.
Top Skills: AnsibleAWSDatadogGCPGoGrafanaKubernetesMySQLPostgresPrometheusPythonTerraform
12 Days AgoSaved
Remote or Hybrid
San Francisco Bay Area, CA
Senior level
Senior level
Software
Lead reliability activities for photonic integrated circuits (PICs): evaluate failure modes, coordinate accelerated stress tests, develop life models from aging-data, and drive failure mode analyses across design, development, and production teams.
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account