Get the job you really want.
Maximum of 25 job preferences reached.
Top Reliability Engineer Jobs in San Francisco, CA
Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
The Senior Hardware Reliability Engineer ensures product reliability through planning, testing, and collaboration across engineering and operations. Responsibilities include leading investigations, analyzing failure data, and designing reliability strategies throughout the product lifecycle.
Top Skills:
Environmental TestingFailure AnalysisFirmware EngineeringHardware ReliabilityReliability ModelingStress Testing
Artificial Intelligence • Machine Learning
Own and modernize Domino's Tempest scale-testing platform; build repeatable automated validation, sizing guidance, and cloud-scale test automation; partner with platform teams to enable multi-cloud scale testing and improve test reliability and reporting.
Top Skills:
Ci SystemsCloud PlatformsCloud-Native ToolingEnd-To-End FrameworksKubernetesMulti-CloudPerformance/Load Testing FrameworksPythonTempest
Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
The Staff Software Engineer will develop reliability software features for autonomous vehicles, focusing on multi-sensor systems and frameworks while collaborating with cross-functional teams and driving improvements in reliability and performance.
Top Skills:
C++11Embedded LinuxEmbedded SystemsPosixPython
Big Data • Cloud • Software • Database
Develop and maintain Kubernetes runtime environments, support developers, resolve critical issues, and participate in on-call rotations for production systems.
Top Skills:
AWSAzureCert-ManagerCorednsCrdsCriCsiGatekeeperGCPGoHelmKubernetesKustomizeOperatorsPythonTerraform
Cloud • Software
Responsible for maintaining FedRAMP compliant services, designing infrastructure, monitoring systems, and ensuring security for federal regions, while driving automation and collaboration with development teams.
Top Skills:
AWSFedrampGoKubernetesPuppetPythonTerraformUnix/Linux
Cloud • Mobile • Software
Drive SRE practices and reliability strategy: implement SLIs/SLOs and error budgets, build observability (metrics, logs, traces, dashboards, alerts), evolve AWS/Terraform infrastructure, automate toil, participate in incident response, develop runbooks and safeguards, and collaborate with engineering and product teams to design and operate reliable services.
Top Skills:
Ai-Assisted ToolingAWSDatadogDockerEcsEksGrafanaHoneycombIncident.IoInfrastructure As CodeKubernetesLlmsNew RelicNode.jsOpsgeniePagerdutyPrometheusPythonTerraformTypescript
Marketing Tech • Mobile • Software
As a Senior Site Reliability Engineer, you'll maintain and enhance the Currents data export system, focusing on observability, scalability, and reliability, while mentoring junior engineers and solving performance issues.
Top Skills:
BuildkiteDatadogDocker SwarmGitGitlabJavaJenkinsKafkaKotlinKubernetesMongoDBPagerdutyPostgresRubySentrySidekiqSnsSqs
Cloud • Mobile • Software
Improve and protect production reliability and performance of AWS-based systems. Implement SRE practices (SLIs/SLOs, error budgets), build observability, automate infrastructure with Terraform, contribute code and tooling, participate in incident response, and document runbooks and best practices.
Top Skills:
AWSDatadogDockerEcsEksGrafanaHoneycombIncident.IoKubernetesLlmsNew RelicNode.jsOpsgeniePagerdutyPrometheusPythonTerraformTypescript
Cloud • Hardware • Security • Software
Manage and enhance infrastructure, automate processes, define roadmaps, and support engineering teams while ensuring uptime and efficiency.
Top Skills:
ArgocdAWSKubernetesPythonTerraform
Artificial Intelligence • Information Technology • Machine Learning • Natural Language Processing • Productivity • Software • Generative AI
As a Site Reliability Engineer, you'll build software to ensure system reliability, scale infrastructure, and deploy ML systems while collaborating with cross-functional teams.
Top Skills:
AWSAzureDockerGCPJavaKubernetesLinuxTerraform
Food • Marketing Tech • Manufacturing
The Senior Reliability Engineer enhances equipment reliability, reduces downtime, and improves maintenance strategies across production. This role involves collaboration with engineering and operations, leading reliability programs, and mentoring junior staff.
Top Skills:
Advanced AnalyticsCmmsDigital ToolsReliability Modeling Tools
Reposted 11 Hours AgoSaved
Easy Apply
Easy Apply
Energy
As a Reliability Engineer, you will define reliability requirements, analyze design failures, run tests, and ensure high standards for hardware products.
Top Skills:
JmpMinitabPython
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Fintech • Payments • Productivity • Financial Services
As a Senior Database Reliability Engineer, you will enhance database reliability and performance, develop automation tools, and support GCP persistence tools.
Top Skills:
BashChefGCPMySQLPerlPuppetPythonRubySaltTerraform
Digital Media • eCommerce • Gaming • Mobile • News + Entertainment
The Senior Software Engineer in Device Reliability will enhance the Crunchyroll app's reliability across various devices by developing automated tests and collaborating with multiple teams to ensure seamless user experience.
Top Skills:
HTML5JavaScriptNoSQLReactRelational DatabaseTypescript
Artificial Intelligence • Machine Learning • Robotics • Software • Transportation • Design • Manufacturing
The Design Reliability Engineer will establish reliability targets, lead DFMEA processes, develop test plans, and implement monitoring systems for sensors and automotive electronics.
Top Skills:
NumpyPandasPysparkPythonScipy
Reposted 7 Days AgoSaved
Easy Apply
Easy Apply
AdTech
As a Site Reliability Engineer, you'll maintain the infrastructure for systems, ensure efficiency, automate processes, monitor databases, and participate in architecture discussions.
Top Skills:
Amazon KinesisAws LambdaAws SnsBigQueryDockerGcp (Google Cloud Platform)GitlabGoogle Cloud FunctionsGoogle Cloud RunGoogle Pub/SubGrafanaIstioKafkaKubernetesMySQLPrometheusSpannerSQLTerraform
AdTech
The Site Reliability Engineer will build and maintain infrastructure, manage databases, automate operations, and ensure system efficiency and scalability at Attain.
Top Skills:
Amazon KinesisAws LambdaAws SnsBigQueryDockerGCPGitlabGoogle Cloud FunctionsGoogle Cloud RunGoogle Pub/SubGrafanaIstioKafkaKubernetesMySQLPrometheusSpannerTerraform
Healthtech • Biotech
The Plant & Reliability Engineer ensures the reliable operation of critical infrastructure, leading initiatives for maintenance optimization, strategic improvement, and system ownership within a high-stakes environment.
Top Skills:
Building Automation SystemsData HistoriansProgrammable Logic ControllersSap Cmms
Marketing Tech • Mobile • Software
As a Senior Site Reliability Engineer at Braze, you'll ensure uptime for internal services, improve automation, and develop infrastructure tools, collaborating across teams to enhance reliability and scalability.
Top Skills:
ChefDockerKafkaKubernetesMongoDBRedisRuby On RailsTerraform
Energy
The Electrical Reliability Engineer focuses on improving equipment reliability, maintenance support, troubleshooting, budgeting for equipment replacements, and implementing reliability programs in the petroleum industry.
Top Skills:
EtapGe ApmPowerdbSAP
Artificial Intelligence • Fintech • Machine Learning • Social Impact • Software
Lead technical direction for software architecture and cross-team initiatives focusing on scaling consumer-facing systems and maximizing loan originations while maintaining compliance and system integrity.
Top Skills:
AWSCi/CdDockerGithub ActionsInfrastructure As CodeReactRuby On Rails
Fintech • Software
The Senior Site Reliability Engineer ensures fast, stable SaaS products through automation, collaboration, monitoring, and implementing AI tools to enhance performance and reliability.
Top Skills:
Ai ToolsAnsibleAppdynamicsAWSAzureAzure DevopsBashC# .NetCosmosDatadogDynatraceHarnessJavaJenkinsKubernetesNew RelicPowershellPythonSaaSSQLTerraform
Cloud
The role involves designing and optimizing PostgreSQL clusters, automating database tasks, and ensuring high availability and performance while collaborating with other engineering teams.
Top Skills:
AnsibleDatadogGoGrafanaKubernetesMySQLPostgresPrometheusPythonTerraform
Big Data • Cloud • Productivity • Software • Database • Analytics • Automation
The Site Reliability Engineer will automate tasks, enhance platform infrastructure, improve observability, and lead incident response efforts for optimal performance.
Top Skills:
AWSGrafanaHoneycombLinuxPythonTerraform
Energy • Renewable Energy
The Staff Reliability Engineer will ensure hardware reliability in high-voltage electronics, develop reliability test programs, and collaborate on design and testing across teams.
Top Skills:
Hv ElectronicsPower ConversionPython
Top San Francisco Companies Hiring Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results






.jpg)

























