Maximum of 25 job preferences reached.
Top Reliability Engineer Jobs in San Francisco, CA
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Financial Services • Generative AI
Own and improve critical production services end-to-end by writing production-quality code: instrumenting services, eliminating performance bottlenecks, building deployment and observability platforms, defining SLOs, running incident response and post-mortems, capacity planning and cost optimization, maintaining CI/CD, and embedding with product teams to design reliable systems.
Top Skills:
AWSC++Ci/CdContainer OrchestrationGoObservability StacksPythonRust
Food
The Reliability Engineer will manage maintenance of fixed assets, focusing on equipment reliability, predictive maintenance, and collaboration to reduce downtime and improve performance metrics of packaging operations.
Top Skills:
Automation EquipmentThermoforming Packaging MachinesTpm
eCommerce • Retail • Software
The Senior Database Reliability Engineer ensures database availability, reliability, and efficiency, driving initiatives for upgrades, automation, and security while mentoring team members.
Top Skills:
AWSDynamoDBElasticsearchMongoDBMySQLPostgresPowershellPythonRedisSQL Server
Artificial Intelligence • Software
The Lead Infrastructure and Reliability Engineer will enhance GPU operations, define scalability strategies, and develop organizational strengths in a high-demand AI infrastructure setting.
Top Skills:
ContainersDistributed SystemsGpuKubernetesLinuxNetworkingOrchestrationStorage
Artificial Intelligence • Information Technology • Robotics • Software
Sieve is seeking a Founding Reliability Engineer to build and maintain infrastructure for petabyte-scale video workloads, focusing on reliability and security. Responsibilities include incident response, cloud security, and observability systems management.
Top Skills:
AWSC++CloudflareGCPGoOpentelemetryOraclePrometheusPythonRustTerraformVictoriametrics
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
The Senior Site Reliability Engineer will manage system incidents, improve monitoring and logging, optimize database infrastructure, and collaborate on scaling systems efficiently.
Top Skills:
AWSClickhouseKubernetesMySQLPostgresRedis
Artificial Intelligence • Software • Energy • Renewable Energy
The Reliability Engineer will drive reliability for Solid-State Transformers, conduct test plans, analyze data, document risks, and support hardware deployment.
Top Skills:
Ansys SherlockMatlabPythonRReliasoft
Artificial Intelligence • Software • Generative AI
The Founding Platform & Reliability Engineer will design and operate reliable, scalable infrastructure for an AI storytelling platform, involving hands-on implementation and strategic decision-making.
Top Skills:
AmplitudeAWSCloud RunFirebaseGCPModalNext.JsNode.jsPythonReactRedisSentryTypescriptUpstash
Reposted 11 Days AgoSaved
Easy Apply
Easy Apply
Cloud • Information Technology • Security • Software • Cybersecurity
This internship role focuses on SRE skills, requiring collaboration and problem-solving in dynamic environments for Zscaler's Zero Trust Exchange team.
Top Skills:
AnsibleAws EcsKubernetesLinuxPythonTerraform
Software
Drive reliability testing and qualification of cellular base stations, collaborating with R&D for long-term reliability and product lifecycle support.
Top Skills:
ExcelMS OfficeMs WordPtc WindchillPythonTelcordia
Energy
The Reliability Engineer will define reliability requirements, conduct tests, analyze designs, and collaborate on hardware reliability within energy storage products.
Top Skills:
JmpMinitabPython
13 Days AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The Staff Site Reliability Engineer will lead AI-driven innovations, automate cloud infrastructure, implement CI/CD frameworks, and maintain operational IT support at Coinbase.
Top Skills:
AnsibleAWSBashChefCi/CdDockerGitGoKubernetesPuppetPythonRubySaltTerraform
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
13 Days AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The role involves leading AI product development, enhancing CI/CD frameworks, automating IT workflows, supporting AWS services, and driving cloud security best practices.
Top Skills:
AnsibleAWSBashChefCi/CdDockerGitKubernetesPuppetPythonRubySaltTerraform
Reposted 22 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
Develop and maintain Kubernetes runtime environments, support developers, resolve critical issues, and participate in on-call rotations for production systems.
Top Skills:
AWSAzureCert-ManagerCorednsCrdsCriCsiGatekeeperGCPGoHelmKubernetesKustomizeOperatorsPythonTerraform
Database • Analytics
As a Database Reliability Engineer at ClickHouse, you'll improve reliability, manage escalation processes, support incident response, and enhance database performance while collaborating across teams.
Top Skills:
AWSAzureC++ClickhouseGoogle Cloud PlatformPythonShellSQL
Healthtech • Software
The Database Reliability Engineer manages and maintains cloud-based database infrastructures for SaaS applications, focusing on automation, process improvement, and collaboration with engineering teams.
Top Skills:
AnsibleAWSAzureAzure Data FactoryC#DatabricksGCPGitGrafanaInfluxdbMySQLPostgresPowershellPythonSQLSQL ServerTerraform
Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
As a Site Reliability Engineer, you will ensure the stability of Runpod's platform by defining reliability standards, enhancing observability, and automating processes to reduce operational toil.
Top Skills:
BashGoGrafanaLinuxNetworkingPrometheusPython
Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics
The Site Reliability Engineer will ensure system reliability and scalability, manage infrastructure, automate tasks, and collaborate cross-functionally while mentoring junior engineers and supporting production environments.
Top Skills:
AnsibleArgocdBashDatadogGithub ActionsGitlabGoHashicorp ConsulHelmKubernetesPackerPostgresPowershellPythonSQL ServerTerraformTypescript
Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics
As a Senior Site Reliability Engineer, you will ensure software reliability and scalability, manage IAC, CI/CD, monitor systems, and mentor junior engineers while collaborating across teams.
Top Skills:
AnsibleArgocdBashDatadogGithub ActionsGitlabGoHashicorp ConsulHelmKubernetesPackerPostgresPowershellPythonSQL ServerTerraformTypescript
Artificial Intelligence • Information Technology • Software
Lead end-to-end platform reliability: define SLIs/SLOs, harden production architecture, ensure Kubernetes runtime and queue safety, run incident command for Sev1/Sev2, own observability/on-call/runbooks, and gate risky releases while delivering a prioritized reliability roadmap.
Top Skills:
BullmqKoaKubernetesNode.jsPostgraphilePostgresReactRedisTypescript
Robotics • Pharmaceutical
The Hardware Reliability Engineer ensures the robustness of robotic systems through testing, analysis, and collaboration across teams to improve designs and reduce risks.
Top Skills:
Onshape CadPython
Artificial Intelligence • Machine Learning • Robotics • Software • Transportation • Design • Manufacturing
Lead reliability engineering for EV powertrain systems (EDU, high-voltage battery, power distribution). Define reliability targets, drive DFMEA, develop virtual and physical validation and PHM strategies, support field monitoring and corrective actions, embed durability in system design, and collaborate with suppliers and cross-functional teams to improve lifecycle reliability.
Top Skills:
PysparkPythonSQL
Marketing Tech
The Cloud Reliability Engineer develops and deploys cloud tools, maintains systems performance, participates in incident response, and collaborates with teams. Requires DevOps experience, cloud expertise, and programming skills.
Top Skills:
AWSDockerGoGoogle BigqueryGCPKubernetesPythonSQLTerraform
Food • Marketing Tech • Manufacturing
The Senior Reliability Engineer enhances equipment reliability, reduces downtime, and improves maintenance strategies across production. This role involves collaboration with engineering and operations, leading reliability programs, and mentoring junior staff.
Top Skills:
Advanced AnalyticsCmmsDigital ToolsReliability Modeling Tools
Artificial Intelligence • Machine Learning • Robotics • Software • Transportation • Design • Manufacturing
The Design Reliability Engineer will establish reliability targets, lead DFMEA processes, develop test plans, and implement monitoring systems for sensors and automotive electronics.
Top Skills:
NumpyPandasPysparkPythonScipy
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top San Francisco Companies Hiring Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results











.png)




.png)
















