Top Reliability Engineer Jobs in San Francisco, CA

3 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
142K-199K Annually
Senior level
142K-199K Annually
Senior level
Artificial Intelligence • Fintech • Information Technology • Logistics • Payments • Business Intelligence • Generative AI
Lead design, automation, and maintenance of cloud-based database infrastructure (primarily SQL Server and MySQL). Improve reliability with monitoring, HA/DR, automation, troubleshooting, on-call support, and mentoring of junior engineers while collaborating across teams.
Top Skills: AuroraAWSBashFailover ClusteringMySQLNew RelicOrchestratorPmmPythonRdsRubySQL ServerVividcortex
5 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
182K-250K Annually
Senior level
182K-250K Annually
Senior level
Healthtech • Social Impact • Software
Define and scale reliability practices across the company by creating SLO/SLA frameworks, improving observability, evolving incident response, building self-service tooling and scorecards, and driving cross-team adoption to enable teams to build and operate reliable production systems at scale.
Top Skills: AWSDatadogEksKubernetesPostgresTerraform
Reposted 13 Days AgoSaved
Easy Apply
Hybrid
San Francisco Bay Area, CA
Easy Apply
204K-240K Annually
Senior level
204K-240K Annually
Senior level
Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
The Senior Hardware Reliability Engineer ensures product reliability through planning, testing, and collaboration across engineering and operations. Responsibilities include leading investigations, analyzing failure data, and designing reliability strategies throughout the product lifecycle.
Top Skills: Environmental TestingFailure AnalysisFirmware EngineeringHardware ReliabilityReliability ModelingStress Testing
Reposted 3 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
167K-226K Annually
Senior level
167K-226K Annually
Senior level
Security • Software • Cybersecurity • Automation
As a Senior Site Reliability Engineer, you will enhance the reliability of Drata’s product teams through automation, architecture reviews, and operational excellence using cloud-native technologies.
Top Skills: AiopsAWSBashDatadogDockerGitGithub ActionsKubernetesLinuxMySQLPythonTerraform
Reposted 4 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
160K-250K Annually
Senior level
160K-250K Annually
Senior level
Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
Lead design and delivery of scalable cloud infrastructure for the Spend product. Embed with development teams to drive reliability, performance, observability, incident response, and automation. Own SLOs, runbooks, DevOps metrics, and collaborate with central DevOps and security teams to ensure compliance and resilience. Lead infrastructure projects including new service launches, data centre migrations, and modernising data pipelines.
Top Skills: Analytics PipelinesAWSData StreamingDevOpsGCPIncident ResponseKubernetesObservabilitySlosSre
Reposted 2 Days AgoSaved
In-Office
San Francisco Bay Area, CA
139K-178K Annually
Senior level
139K-178K Annually
Senior level
Artificial Intelligence • Information Technology • Machine Learning • Marketing Tech • Software • Biotech • Design
The Hardware Reliability Engineer plans and executes reliability testing, develops testing methods, performs failure analysis, and collaborates with cross-functional teams to ensure product quality.
Top Skills: Data AnalysisElectrical EngineeringEnvironmental ReliabilityMechanical EngineeringReliability Testing
Reposted 8 Days AgoSaved
In-Office
San Francisco Bay Area, CA
160K-300K Annually
Senior level
160K-300K Annually
Senior level
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Financial Services • Generative AI
As a Site Reliability Engineer, you'll design and improve critical production systems, lead incident response, and enhance observability while embedding with product teams to ensure reliability and performance at scale.
Top Skills: AWSC++Ci/CdGoPythonRust
Reposted 8 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
214K-260K Annually
Senior level
214K-260K Annually
Senior level
Artificial Intelligence • Information Technology • Machine Learning • Natural Language Processing • Productivity • Software • Generative AI
The SRE will ensure the reliability of backend systems, scale Kubernetes-based control planes, and improve automation mechanisms while managing incident processes.
Top Skills: AWSAzureDockerGCPJavaKubernetesLinuxTerraform
9 Days AgoSaved
In-Office
San Francisco Bay Area, CA
182K-242K Annually
Senior level
182K-242K Annually
Senior level
Cloud • Information Technology • Machine Learning
Own, build, and operate production reliability tooling and systems across the cloud stack. Lead projects to improve availability, scalability, automation, observability, and incident response. Ship production services in Python/Go, participate on-call, reduce toil through automation, and maintain long-lived platform frameworks.
Top Skills: Cloud-NativeGoGpu-Accelerated InfrastructureKubernetesMetricsPythonSlos/SlisStructured LogsTracing
Reposted 21 Hours AgoSaved
Remote or Hybrid
San Francisco Bay Area, CA
175K-200K Annually
Senior level
175K-200K Annually
Senior level
eCommerce • Fintech • Payments • Software
The role involves ensuring software reliability and performance, managing incidents, developing infrastructure automation, and mentoring junior engineers within a platform team.
Top Skills: AWSCloudFormationDatadogKubernetesOpentelemetryRubyRuby On RailsTerraform
Reposted 10 Days AgoSaved
Easy Apply
In-Office
San Francisco Bay Area, CA
Easy Apply
Mid level
Mid level
AdTech
As a Site Reliability Engineer, you'll maintain the infrastructure for systems, ensure efficiency, automate processes, monitor databases, and participate in architecture discussions.
Top Skills: Amazon KinesisAws LambdaAws SnsBigQueryDockerGcp (Google Cloud Platform)GitlabGoogle Cloud FunctionsGoogle Cloud RunGoogle Pub/SubGrafanaIstioKafkaKubernetesMySQLPrometheusSpannerSQLTerraform
Reposted 10 Days AgoSaved
Easy Apply
Remote or Hybrid
San Francisco Bay Area, CA
Easy Apply
127K-249K Annually
Senior level
127K-249K Annually
Senior level
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will lead security design and implementation for cloud infrastructures, mentor teams, and automate security solutions.
Top Skills: AnsibleAWSAzureCloud Security ToolsCloudFormationGCPGoTerraform
New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free
Application Tracker Preview
5 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
Senior level
Senior level
Artificial Intelligence • Machine Learning • Software • Generative AI
Help design, scale, and improve platform reliability: define SLOs/SLIs, run on-call and incident response, build observability, improve resilience to external dependencies, enhance CI/CD and deploy safety, optimize cost and capacity, and influence infrastructure architecture.
Top Skills: AmplitudeAWSCloud RunContainersEcsFargateFirebaseGCPKubernetesModalNext.JsNode.jsPythonReactRedisSentryServerlessTypescriptUpstash
Reposted 5 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
170K-226K Annually
Senior level
170K-226K Annually
Senior level
Artificial Intelligence • Hardware • Robotics • Software
The role involves developing and executing test strategies for autonomous systems, collaborating with engineering teams for reliability, and analyzing data for risk assessments.
Top Skills: PythonSQL
Reposted 11 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
147K-278K Annually
Senior level
147K-278K Annually
Senior level
Cloud • Software
Responsible for maintaining FedRAMP-compliant infrastructure, collaborating with software engineers, and ensuring system availability and security. Duties include infrastructure design, automation, monitoring, and incident response.
Top Skills: AWSGoKubernetesPuppetPythonTerraform
Reposted 2 Days AgoSaved
Remote
San Francisco Bay Area, CA
150K-220K Annually
Senior level
150K-220K Annually
Senior level
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Conversational AI
The engineer will build and operate AI/ML infrastructure, managing services on AWS and bare metal, using tools like Kubernetes and Terraform.
Top Skills: AWSBashGoKubernetesPythonSlurmTerraform
Reposted 6 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
160K-210K Annually
Senior level
160K-210K Annually
Senior level
Wearables
The Hardware Reliability Engineer will oversee reliability testing for Skip's wearable devices, manage FMEA, perform environmental stress testing, and lead root cause analyses to ensure product performance in real-world conditions.
Top Skills: Environmental TestingFmeaHaltHassThermal ChambersVibration TablesWeibull Analysis
13 Days AgoSaved
Easy Apply
Hybrid
San Francisco Bay Area, CA
Easy Apply
210K-270K Annually
Senior level
210K-270K Annually
Senior level
Healthtech • Information Technology • Software • Telehealth
Lead reliability efforts for Zocdoc's cloud-based, consumer-facing services: monitor and maintain production systems, automate tooling and infrastructure, support scaling and performance, debug production incidents, and work with product teams to improve uptime and reliability.
Top Skills: AWSDistributed SystemsDnsDockerGCPGenaiHTTPHttpsKubernetesLoad BalancerMicroservicesNtpReverse ProxyTcp/IpTlsWeb Application Firewall
Reposted 13 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
161K-284K Annually
Senior level
161K-284K Annually
Senior level
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
The Senior Site Reliability Engineer will enhance reliability of Block's platform, improve incident response using AI tools, and coordinate incident management. Responsibilities include building reliable systems, standardizing tools, and leading high-severity incidents during on-call rotations.
Top Skills: Amazon Web ServicesDatadogDynamoDBGrpcHTTPIstioJavaJSONKotlinKubernetesLaunchdarklyMySQLProtocol BuffersTerraformVitess
Reposted 4 Days AgoSaved
Easy Apply
Remote
San Francisco Bay Area, CA
Easy Apply
150K-200K Annually
Senior level
150K-200K Annually
Senior level
Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
As a Site Reliability Engineer, you will ensure system stability and resilience, define reliability standards, and automate operational processes while collaborating cross-functionally to improve performance and reduce incidents.
Top Skills: BashCi/CdDockerGoGrafanaKubernetesLinuxPrometheusPython
Reposted 4 Days AgoSaved
Remote
San Francisco Bay Area, CA
223K-302K Annually
Expert/Leader
223K-302K Annually
Expert/Leader
Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
The role involves defining reliability strategies, leading initiatives across teams, enhancing monitoring and incident response, and mentoring engineers at Dropbox.
Top Skills: Ai TechnologiesDebuggingDistributed SystemsIncident ResponseObservabilityReliability Risk ManagementSlasSlos
5 Days AgoSaved
Easy Apply
Remote or Hybrid
San Francisco Bay Area, CA
Easy Apply
200K-230K Annually
Senior level
200K-230K Annually
Senior level
Artificial Intelligence • Machine Learning
Lead development of AI-assisted reliability tooling, own incident response end-to-end, improve observability and SLO/SLI frameworks, scale single-tenant SaaS operations, mentor engineers, and reduce recurring operational toil through engineering and automation.
Top Skills: Cloud PlatformsGoKubernetesLinuxLlm/Ai ToolingLogs And TracingObservability ToolingPythonSlo/Sli Frameworks
Reposted 9 Days AgoSaved
In-Office
San Francisco Bay Area, CA
115K-160K Annually
Entry level
115K-160K Annually
Entry level
Artificial Intelligence • Software • Energy • Renewable Energy
The Reliability Engineer will drive reliability for Solid-State Transformers, conduct test plans, analyze data, document risks, and support hardware deployment.
Top Skills: Ansys SherlockMatlabPythonRReliasoft
Reposted 9 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
Senior level
Senior level
Artificial Intelligence • Software • Generative AI
The Founding Platform & Reliability Engineer will design and operate reliable, scalable infrastructure for an AI storytelling platform, involving hands-on implementation and strategic decision-making.
Top Skills: AmplitudeAWSCloud RunFirebaseGCPModalNext.JsNode.jsPythonReactRedisSentryTypescriptUpstash
Reposted 10 Days AgoSaved
In-Office
San Francisco Bay Area, CA
123K-193K Annually
Senior level
123K-193K Annually
Senior level
Energy
The Reliability Engineer will define reliability requirements, conduct tests, analyze designs, and collaborate on hardware reliability within energy storage products.
Top Skills: JmpMinitabPython
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account