Get the job you really want.

Top Reliability Engineer Jobs in San Francisco, CA

Reposted 2 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
Senior level
Senior level
Software
Lead and manage engineering teams for ConductorOne's cloud infrastructure, ensuring reliability, security, and compliance while fostering team growth and culture.
Top Skills: AICi/CdCloud InfrastructureIso 27001)KubernetesSecurity Compliance (Soc 2
Reposted 21 Days AgoSaved
Remote
San Francisco Bay Area, CA
148K-195K Annually
Mid level
148K-195K Annually
Mid level
Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3
The Site Reliability Engineer will build and maintain infrastructure, improve software systems, develop scalable microservices, and ensure quality software delivery.
Top Skills: AWSGoGoogle Cloud PlatformJavaKubernetesAzureSQL
Reposted 3 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
150K-199K Annually
Senior level
150K-199K Annually
Senior level
Fintech • Payments
The Senior Staff SRE leads reliability engineering initiatives, drives operational excellence, mentors staff, and influences architecture to enhance system reliability and performance.
Top Skills: Ai/MlAWSAzureDockerElk StackGCPGrafanaKubernetesMySQLNoSQLPostgresSplunk
4 Days AgoSaved
In-Office
San Francisco Bay Area, CA
192K-308K Annually
Senior level
192K-308K Annually
Senior level
Fintech • Information Technology • Payments
Lead software engineering initiatives for Middleware Reliability Engineering by automating processes, enhancing system reliability, and promoting DevOps practices, impacting global payment systems.
Top Skills: AnsibleAWSAzureDockerElkGCPGitGoGrafanaJavaJenkinsKubernetesPrometheusPythonTerraform
4 Days AgoSaved
Easy Apply
In-Office
San Francisco Bay Area, CA
Easy Apply
Mid level
Mid level
Robotics
The Site Reliability Engineer will design and operate scalable systems, own cloud infrastructure, implement observability tools, and ensure production excellence.
Top Skills: AWSAzureDatadogGCPKubernetesPrometheusSplunkTerraform
Reposted 4 Days AgoSaved
In-Office
San Francisco Bay Area, CA
180K-200K Annually
Senior level
180K-200K Annually
Senior level
Productivity
The Senior Site Reliability Engineer will enhance site reliability through monitoring, optimizing infrastructure, collaborating on engineering projects, and ensuring systems’ stability.
Top Skills: AWSDockerKubernetesTemporal
Reposted 4 Days AgoSaved
Easy Apply
In-Office
San Francisco Bay Area, CA
Easy Apply
Senior level
Senior level
Software • Generative AI
As a Site Reliability Engineer at Fireworks AI, you'll ensure system reliability, manage incidents, develop monitoring solutions, and reduce operational toil, while collaborating with software engineers to embed reliability in the development lifecycle.
Top Skills: AWSAzureDockerElk StackGCPGoGrafanaKubernetesPrometheusPython
Reposted 4 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
175K-225K Annually
Senior level
175K-225K Annually
Senior level
Artificial Intelligence • Machine Learning • Database
The role involves ensuring the reliability and performance of distributed database systems, developing monitoring strategies, and automating operations in a cloud-native environment.
Top Skills: AnsibleArgoAWSAzureDockerGCPGitlab CiGoJavaJenkinsKubernetesPythonTerraform
Reposted 4 Days AgoSaved
In-Office
San Francisco Bay Area, CA
255K-490K Annually
Mid level
255K-490K Annually
Mid level
Artificial Intelligence • Machine Learning • Generative AI
The Software Engineer in Reliability will ensure system scalability, reliability, and performance, collaborating with teams to improve infrastructure and handle incidents.
Top Skills: Cloud InfrastructureCloudFormationContainer Orchestration PlatformsContainerization TechnologiesDatadogGrafanaIac ToolsKubernetesMicroservices ArchitectureObservability ToolsProgramming LanguagesPrometheusService Mesh TechnologiesSplunkTerraform
Reposted 4 Days AgoSaved
In-Office
San Francisco Bay Area, CA
240K-401K Annually
Senior level
240K-401K Annually
Senior level
Software
Responsible for deploying observability platforms and automating their operation, developing software for system reliability, and leading cross-team collaboration on monitoring solutions.
Top Skills: AnsibleGoKubernetesPrometheusPromqlTerraform
Reposted 4 Days AgoSaved
In-Office
San Francisco Bay Area, CA
159K-230K Annually
Senior level
159K-230K Annually
Senior level
Artificial Intelligence • Big Data • Machine Learning • Software
The role involves designing and implementing custom installations of the C3 AI Platform for Federal customers, ensuring uptime, and automating system processes while collaborating with cross-functional teams.
Top Skills: AnsibleAWSAzureBashKubernetesLinuxPuppetPythonRubyTerraform
5 Days AgoSaved
In-Office
San Francisco Bay Area, CA
172K-258K Annually
Expert/Leader
172K-258K Annually
Expert/Leader
Fintech
The Principal Site Reliability Engineer designs and implements software to enhance application performance and resilience while ensuring security standards. Responsibilities include automating application management, providing observability, and leading cross-functional teams. Mentorship and on-call rotation participation are expected.
Top Skills: AuroraAWSChefDockerDynamo DbGitGoJavaJenkinsJmsKafkaKubernetesMavenMemcachedOraclePythonRedisSqsSwarm
New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free
Application Tracker Preview
Reposted 5 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
150K-180K Annually
Senior level
150K-180K Annually
Senior level
Fintech • Software
The SRE is responsible for building cloud-native platforms, improving application reliability, and fostering collaboration within teams.
Top Skills: Ci/CdKubernetesOpenshiftOpenstackPrometheusSplunkVMware
5 Days AgoSaved
In-Office
San Francisco Bay Area, CA
Senior level
Senior level
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Generative AI
The Site Reliability Engineer will develop, deploy, and operate AI infrastructure, focusing on high-performance and scalable machine learning systems using Kubernetes and cloud platforms.
Top Skills: AWSAzureC++GCPGoKubernetesOci
5 Days AgoSaved
In-Office
San Francisco Bay Area, CA
127K-192K Annually
Senior level
127K-192K Annually
Senior level
Big Data • Cloud • Marketing Tech • Social Impact • Software
As a Senior Site Reliability Engineer, you will support product deployments, provide engineering support, maintain systems, and collaborate with teams globally to enhance infrastructure reliability.
Top Skills: AWSCassandraCircleCIDynamoDBGCPGoJenkinsKubernetesNosql DatabasesPythonScylladbSinglestore DbTerraform
Reposted 24 Days AgoSaved
Easy Apply
Remote or Hybrid
San Francisco Bay Area, CA
Easy Apply
118K-231K Annually
Senior level
118K-231K Annually
Senior level
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will support, maintain and grow the Atlas platform, focusing on automating processes and running multi-cloud environments.
Top Skills: AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls
Reposted 6 Days AgoSaved
In-Office
San Francisco Bay Area, CA
250K-400K Annually
Senior level
250K-400K Annually
Senior level
Artificial Intelligence • Software
As a Senior/Staff Network Reliability Engineer, you'll optimize and maintain Fluidstack's network platform, ensuring performance and reliability for AI and HPC workloads. Responsibilities include tuning networking protocols, deploying and validating switches, automating telemetry, conducting root-cause analyses, and collaborating with vendors.
Top Skills: BgpDpdkEbpfEvpnGeneveGoPythonRdmaRustTcp/IpVxlanXdp
Reposted 6 Days AgoSaved
Easy Apply
In-Office
San Francisco Bay Area, CA
Easy Apply
150K-200K Annually
Junior
150K-200K Annually
Junior
Artificial Intelligence • Information Technology
As a Site Reliability Engineer, maintain user-facing services, implement best practices for reliability, and manage production incidents.
Top Skills: AnsibleCloud ServicesKubernetesProgramming LanguagesTerraform
Reposted 6 Days AgoSaved
In-Office
San Francisco Bay Area, CA
148K-205K Annually
Senior level
148K-205K Annually
Senior level
Artificial Intelligence • HR Tech • Professional Services
As a Senior Site Reliability Engineer, you will leverage AI tools, manage AWS cloud infrastructure, operate Kubernetes clusters, and collaborate on reliability enhancements, automating incident responses and improving metrics.
Top Skills: AWSDatadogKubernetesOpentelemetryPrometheusTerraform
Reposted 6 Days AgoSaved
In-Office
San Francisco Bay Area, CA
165K-250K Annually
Senior level
165K-250K Annually
Senior level
Artificial Intelligence • Healthtech • Information Technology • Software
As a Site Reliability Engineer, you will manage the production environment, focusing on infrastructure design, automation, and optimizing deployment pipelines to ensure high availability.
Top Skills: HelmKafkaKubernetesPostgresPythonRedisTerraformTypescript
Reposted 6 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
120K-160K Annually
Mid level
120K-160K Annually
Mid level
Consumer Web • Mobile
As a Site Reliability Engineer at Patreon, you'll improve AWS infrastructure, implement SRE practices, enhance Kubernetes capabilities, and develop automation tools.
Top Skills: AnsibleAWSChefKubernetesPuppetPythonTerraform
Reposted 6 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
205K-235K Annually
Senior level
205K-235K Annually
Senior level
Financial Services
The Senior Cluster Site Reliability Engineer will enhance the research compute cluster's uptime, reliability, and performance through engineering and operational improvements, ensuring high availability for researchers working on machine learning problems.
Top Skills: AnsibleAWSAWSCephDockerElkGCPGCPGrafanaHorovodHpcInfinibandKubeflowKueueLokiLustreMlflowOpentelemetryPodmanPrometheusPythonRdmaRubyS3SingularitySlurmTerraform
Reposted 7 Days AgoSaved
Easy Apply
In-Office or Remote
San Francisco Bay Area, CA
Easy Apply
170K-230K Annually
Senior level
170K-230K Annually
Senior level
Artificial Intelligence • Cloud • Information Technology • Software
The Senior Site Reliability Engineer is responsible for managing AI infrastructure, ensuring reliability through scalability, incident response, and collaboration with suppliers, focusing on Kubernetes and advanced GPU services.
Top Skills: AnsibleBashGrafanaKubernetesPrometheusPython
Reposted 7 Days AgoSaved
Easy Apply
In-Office
San Francisco Bay Area, CA
Easy Apply
180K-440K Annually
Mid level
180K-440K Annually
Mid level
Information Technology
As a Site Reliability Engineer, you'll design and operate scalable storage systems and optimize performance for AI research data management.
Top Skills: GoKubernetesPulumiRust
Reposted 7 Days AgoSaved
Easy Apply
In-Office
San Francisco Bay Area, CA
Easy Apply
169K-276K Annually
Senior level
169K-276K Annually
Senior level
Energy
The Site Reliability Engineer will design and implement scalable systems, automate IT infrastructure management, and support deployed systems, ensuring high availability and performance.
Top Skills: Active DirectoryAnsibleAWSAzureChefJSONLinuxPuppetPythonRestVMwareWindows ServerYaml
All Filters
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account