Get the job you really want.
Maximum of 25 job preferences reached.
Top Reliability Engineer Jobs in San Francisco, CA
Reposted 5 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
This role involves building and maintaining observability services, ensuring service reliability, and collaborating with other teams on best practices.
Top Skills:
AWSFluentbitGCPJaegerKubernetesAzureQuickwitSplunkVectorVictoriametrics
Reposted 9 Days AgoSaved
Easy Apply
Easy Apply
Appliances
Lead reliability strategies for HVAC components, develop predictive models, conduct tests, analyze data, and mentor junior engineers.
Top Skills:
Accelerated Life TestingCorrosion TestingEnvironmental ChambersHvac SystemsReliability EngineeringVibration TablesWeibull Analysis
Financial Services
Manage enterprise scheduling platforms, enhance automation with scripting, support incident response, and improve workflows with cross-functional teams.
Top Skills:
ActivebatchApache AirflowAutosysCa Workload AutomationControl-MDatadogGrafanaPowershellPrometheusSplunk
Artificial Intelligence • Healthtech • Software
As a Site Reliability Engineer, you will manage cloud infrastructure, implement observability, and ensure system reliability by collaborating with engineering teams and maintaining databases.
Top Skills:
AzureBashGitGitKubernetesPostgresPythonRedisSQLTypescriptVscode
Cloud
The role involves designing and optimizing PostgreSQL clusters, automating database tasks, and ensuring high availability and performance while collaborating with other engineering teams.
Top Skills:
AnsibleDatadogGoGrafanaKubernetesMySQLPostgresPrometheusPythonTerraform
Healthtech • Biotech
The Plant & Reliability Engineer ensures the reliable operation of critical infrastructure, leading initiatives for maintenance optimization, strategic improvement, and system ownership within a high-stakes environment.
Top Skills:
Building Automation SystemsData HistoriansProgrammable Logic ControllersSap Cmms
eCommerce • Legal Tech • Professional Services • Software • Data Privacy
The Site Reliability Engineer will ensure systems run smoothly, work with automation tools, resolve issues, and drive operational improvements.
Top Skills:
AWSAzureCloudFormationDockerGCPGrafanaKubernetesMemcachedNew RelicOpentelemetryPostgresPrometheusPulumiRedisSentryTerraform
Artificial Intelligence • Hardware • Robotics • Software
The Hardware Test and Reliability Engineer will develop test plans, analyze failure modes, and ensure reliability of autonomous systems by collaborating across engineering teams.
Top Skills:
PythonSQL
Big Data • Cloud • Productivity • Software • Database • Analytics • Automation
The Site Reliability Engineer will support engineering teams, enhance system resilience, and drive scalable infrastructure practices.
Top Skills:
Aws ServicesGrafanaHoneycombLinuxPythonTerraform
Artificial Intelligence • Fintech • Hardware • Information Technology • Sales • Software • Transportation
Design, scale, and manage AWS services for IoT devices. Collaborate on infrastructure, optimize performance, and ensure high availability of services.
Top Skills:
AWSBashGoHelmKubernetesPythonRubyTerraform
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The Site Reliability Engineer will enhance CI/CD frameworks, automate cloud infrastructure, manage Kubernetes and AWS services, and ensure operational excellence.
Top Skills:
AnsibleAWSBashChefCi/CdDockerGitKubernetesPuppetPythonRubySaltTerraform
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The role involves supporting network infrastructure, automating cloud services, deploying Kubernetes, managing CI/CD workflows, and ensuring cloud security best practices.
Top Skills:
AnsibleAWSBashChefDockerGitGoKubernetesPuppetPythonRubySaltTerraform
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Artificial Intelligence • Other • Security • Software • Analytics • Big Data Analytics
The Lead Site Reliability Engineer will oversee the reliability and scalability of the infrastructure, lead a team in operational execution, ensure best practices in SRE, and mentor senior engineers.
Top Skills:
Ci/CdDockerGitopsGoKubernetesLinuxPythonTerraform
AdTech • Artificial Intelligence • Marketing Tech • Software • Analytics
The Senior Site Reliability Engineer will enhance system reliability, develop production-grade code, implement observability tools, conduct root cause analyses, and collaborate on system design for scalability.
Top Skills:
ArgocdCi/CdDockerGitopsGoGrafanaHoneycombJenkinsKubernetesOpentelemetryPrometheusPythonTerraform
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Financial Services • Generative AI
As a Site Reliability Engineer, you will ensure system uptime, manage CI/CD pipelines, and enhance security and observability while troubleshooting issues in a collaborative environment.
Top Skills:
AWSAzureCloudFormationDatadogDockerGCPGrafanaKubernetesPrometheusTerraform
Big Data • Information Technology • Productivity • Software • Analytics • Business Intelligence • Consulting
Join Celonis' Reliability Engineering team to ensure the health and performance of their platform, applying SRE principles and mentoring engineers while leading reliability efforts for microservices on Kubernetes.
Top Skills:
ArgocdAWSAzureDatadogGCPGithub ActionsJavaKubernetesKustomizePythonSpring FrameworkTerraform
Reposted 24 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
As a Staff Engineer in the InfraSec team, you'll lead the design and deployment of security solutions for cloud platforms, automate monitoring, and manage security tooling while mentoring a small team of SREs.
Top Skills:
AnsibleAWSAzureCloudFormationGCPGoTerraform
Reposted 15 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will support, maintain and grow the Atlas platform, focusing on automating processes and running multi-cloud environments.
Top Skills:
AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls
Reposted 17 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Fintech • Mobile • Payments • Financial Services
The Staff Software Engineer in SRE is responsible for setting technical strategy, ensuring system availability, guiding incident management, and fostering talent within the team to enhance overall system reliability.
Top Skills:
AWSBashKotlinKubernetesMySQLPythonSpark
Reposted 23 Days AgoSaved
Easy Apply
Easy Apply
Energy
As a Reliability Engineer, you'll define system reliability requirements, execute various reliability tests, and analyze data to improve product performance in hardware engineering for energy storage.
Top Skills:
JmpMinitabPython
Fintech
The role involves creating and maintaining application infrastructure, ensuring reliability, automation, and scalability, while collaborating with development teams and managing incidents.
Top Skills:
AWSDockerGitGoJavaJavaScriptKubernetesLinuxPythonRubySwarm
Artificial Intelligence • Machine Learning • Generative AI
As a Software Engineer for Infrastructure Reliability, you will build and operate scalable systems, optimize performance, and collaborate across teams to ensure system resilience and reliability for AI applications.
Top Skills:
AWSAzureCi/CdDatadogElk StackGCPGrafanaKubernetesLinuxPrometheusSplunkTerraform
eCommerce • Healthtech • Kids + Family • Retail • Social Media
Seeking a Senior Software Engineer, Site Reliability to ensure system stability, scalability, and reliability, while optimizing AWS infrastructure using modern DevOps practices and tools like Terraform, Docker, and Kubernetes.
Top Skills:
AWSCircleCICronitorDatadogDockerGithub ActionsJenkinsKubernetesMySQLPagerdutyReactRedisRuby On RailsSentrySidekiqTerraform
Fintech • Software
The Principal Site Reliability Engineer is responsible for maintaining cloud infrastructure, ensuring application performance, and implementing automated solutions in a SaaS environment, while collaborating with security and software engineering teams.
Top Skills:
.NetAnsibleAppdynamicsAWSAzureAzure DevopsC#DatadogDynatraceHarnessJavaJenkinsKubernetesNew RelicTerraform
Artificial Intelligence • Software
The SRE at Fluidstack is responsible for ensuring infrastructure reliability and performance, handling complex production issues, and improving platform stability.
Top Skills:
AnsibleBashGoKubernetesPythonSlurmTerraform
Top San Francisco Companies Hiring Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results










.png)
.png)





















