Job Title, Company or Keyword

Maximum of 25 job preferences reached.

Top Reliability Engineer Jobs in San Francisco, CA

Upstart

Senior Software Engineer, Site Reliability

Reposted 17 Days AgoSaved

Easy Apply

Remote

San Francisco Bay Area, CA

Easy Apply

167K-231K Annually

Senior level

167K-231K Annually

Senior level

Artificial Intelligence • Fintech • Machine Learning • Social Impact • Software

Lead technical direction for software architecture and cross-team initiatives focusing on scaling consumer-facing systems and maximizing loan originations while maintaining compliance and system integrity.

Top Skills: AWSCi/CdDockerGithub ActionsInfrastructure As CodeReactRuby On Rails

Eight Sleep

Senior Reliability Engineer

23 Days AgoSaved

In-Office

San Francisco Bay Area, CA

150K-180K Annually

Senior level

150K-180K Annually

Senior level

Hardware • Healthtech • Machine Learning • Software

Lead reliability engineering for electromechanical systems, including testing and validation of hardware to ensure performance and durability.

Top Skills: Accelerated Life TestingFmeaHalt/HassJmpLabviewMatlabPythonSpcWeibull Analysis

Plenful

Site Reliability Engineer

Reposted 22 Hours AgoSaved

In-Office

San Francisco Bay Area, CA

Senior level

Artificial Intelligence • Healthtech

The Site Reliability Engineer will enhance system reliability, define observability standards, respond to incidents, and collaborate with engineering teams on performance and compliance improvements.

Top Skills: AWSContainerized ServicesDistributed WorkflowsObservability ToolingPostgresServerless Compute

Peak Energy

Staff Reliability Engineer

Reposted 23 Days AgoSaved

In-Office

San Francisco Bay Area, CA

180K-230K Annually

Senior level

180K-230K Annually

Senior level

Energy • Renewable Energy

The Staff Reliability Engineer will ensure hardware reliability in high-voltage electronics, develop reliability test programs, and collaborate on design and testing across teams.

Top Skills: Hv ElectronicsPower ConversionPython

Blaxel

Site Reliability Engineer

Reposted YesterdaySaved

In-Office

San Francisco Bay Area, CA

175K-250K Annually

Mid level

175K-250K Annually

Mid level

Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)

The Site Reliability Engineer will ensure the reliability and performance of AI infrastructure, build core systems, handle incident response, and develop automation tools.

Top Skills: AWSDatadogElkGCPGithub ActionsGitlab CiGoGrafanaJenkinsKubernetesLinuxPrometheusPulumiPythonRustTerraform

You.com

Senior Site Reliability Engineer

Reposted YesterdaySaved

In-Office

San Francisco Bay Area, CA

195K-240K Annually

Senior level

195K-240K Annually

Senior level

Software

The Site Reliability Engineer will enhance reliability, observability, and incident response of You.com's production services, while collaborating with teams to implement best practices and improve operational efficiency through tooling and automation.

Top Skills: AWSBashCi/CdEksGhaGitGitGrafanaOpentelemetryPrometheusPythonTerraform

Rocket Companies

Staff Infrastructure Reliability Engineer - Database & Storage

Reposted 24 Days AgoSaved

In-Office or Remote

San Francisco Bay Area, CA

180K-279K Annually

Senior level

180K-279K Annually

Senior level

Fintech • Financial Services

The Staff Infrastructure Reliability Engineer leads Redfin's production database and storage systems, collaborating on strategies for reliability, scalability, and performance, while mentoring engineers and guiding complex technical discussions.

Top Skills: AWSAws AuroraAws RdsAws S3DynamoDBElasticacheOpensearchPostgresPythonRdbms

Prosper

Sr. Site Reliability Engineer

Reposted 2 Days AgoSaved

Hybrid

San Francisco Bay Area, CA

163K-203K Annually

Senior level

163K-203K Annually

Senior level

Fintech

As a Senior Site Reliability Engineer, you will ensure the reliability, scalability, and security of Prosper's Cloud Platform while designing AI-assisted operations and mentoring junior engineers.

Top Skills: ApmCi/CdCloudInfrastructure As CodeKubernetes

Autodesk

Senior Site Reliability Engineer

3 Days AgoSaved

In-Office

San Francisco Bay Area, CA

117K-209K Annually

Senior level

117K-209K Annually

Senior level

Big Data • Cloud • Digital Media • Machine Learning • Mobile • Software • Industrial

Lead reliability for Autodesk GovCloud services by deploying, operating, and automating production systems. Define SLOs/SLIs, build observability and automation, run incident response and on-call rotation, ensure compliance (FedRAMP), perform resilience testing and toil reduction, and collaborate across engineering, security, and platform teams to improve service reliability and operability.

Top Skills: APIsAWSAws GovcloudAzureBashCaching TechnologiesCi/CdCloudwatchContainersDatabasesDatadogDnsDynatraceFedrampGoIl4Il5Infrastructure As CodeJavaKubernetesLoad BalancingMessaging SystemsNetworkingPowershellPythonSplunkStorage Platforms

Site Reliability Engineer II, tvScientific

3 Days AgoSaved

In-Office or Remote

San Francisco Bay Area, CA

114K-235K Annually

Mid level

114K-235K Annually

Mid level

Social Media

Operate, scale, and improve a cloud-native platform on AWS and Kubernetes. Manage GitOps deployments with ArgoCD and Helm, provision infra with Terraform/Terragrunt, build CI/CD automation, enhance observability, respond to incidents, reduce operational toil through scripting, and collaborate with security and application teams to improve reliability and platform guardrails.

Top Skills: ArgocdAWSBashContainersEksGithub ActionsGitopsHelmIamKubernetesLinuxPythonTerraformTerragrunt

Filevine

Senior Database Reliability Engineer

Reposted 16 Days AgoSaved

Remote

San Francisco Bay Area, CA

145K-180K Annually

Senior level

145K-180K Annually

Senior level

Legal Tech • Software

Lead automation and optimization of Filevine's data platform: performance tune MSSQL/Postgres, optimize Snowflake, provision infrastructure with Terraform/AWS, run stateful containers on Kubernetes, integrate AI/LLM and MCP for operational automation, manage CI/CD, capacity planning, documentation, and serve in 24/7 on-call rotation.

Top Skills: AWSC#DapperDockerDynamoDBEntity FrameworkGitlabKubernetesLlmsMcp (Model Context Protocol)Microsoft Sql Server (Mssql)Octopus DeployOpensearchPostgresPowershellPythonRedisSnowflakeTerraform

Thinking Machines Lab

Site Reliability Engineer (SRE)

Reposted 3 Days AgoSaved

In-Office

San Francisco Bay Area, CA

350K-475K Annually

Mid level

350K-475K Annually

Mid level

Artificial Intelligence • Information Technology

The Site Reliability Engineer will drive reliability for the Tinker platform, focusing on incident response, monitoring, and ensuring system resilience while collaborating across teams.

Top Skills: Cloud InfrastructureKubernetes

New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free

Carta

Senior Site Reliability Engineer

Reposted 3 Days AgoSaved

Hybrid

San Francisco Bay Area, CA

182K-225K Annually

Senior level

182K-225K Annually

Senior level

Fintech • Software

As a Senior Site Reliability Engineer, you'll build and scale internal platform offerings, design monitoring systems, and collaborate with software engineers to ensure application performance and reliability.

Top Skills: AnsibleAWSCloudFormationDatadogDockerElk StackGrafanaGrpcJavaKubernetesPostgresPrometheusPythonTerraform

Okta

Staff Site Reliability Engineer - Observability GCP

Reposted 4 Days AgoSaved

In-Office

San Francisco Bay Area, CA

194K-267K Annually

Senior level

194K-267K Annually

Senior level

Cloud

The role involves building and managing observability infrastructure in GCP, automating deployments, and optimizing data processes for high reliability.

Top Skills: GkeGoGCPGrafanaKubernetesOpentelemetryPythonRubySplunkTerraform

Anyscale

Senior Site Reliability Engineer, Platform Infrastructure (Foundations)

4 Days AgoSaved

Hybrid

San Francisco Bay Area, CA

Senior level

Artificial Intelligence • Software

Design, build, and scale control- and data-plane infrastructure for distributed AI workloads. Improve reliability, performance, scheduling, and observability for Ray clusters across cloud and on-prem environments. Support accelerator integration, container image management, and provide on-call troubleshooting and cross-team collaboration.

Top Skills: AWSAzureContainersGCPGoGpusGrafanaKubernetesLinuxPrometheusPythonRayTpusVms

Zingtree

Senior DevOps / Platform Reliability Engineer

Reposted 17 Days AgoSaved

Remote

San Francisco Bay Area, CA

Senior level

Software

As a Senior DevOps / Platform Reliability Engineer, you will manage CI/CD pipelines, automate infrastructure, operate Kubernetes, and enhance observability while ensuring security and compliance for enterprise systems.

Top Skills: Argo CdAurora MysqlAWSBashCloudFormationEksElasticacheGithub ActionsGrafanaKubernetesLinuxMskOpentelemetryPrometheusPythonS3Terraform

E2B

SRE/Infrastructure Engineer

Reposted 4 Days AgoSaved

In-Office

San Francisco Bay Area, CA

200K-350K Annually

Senior level

200K-350K Annually

Senior level

Artificial Intelligence

The SRE/Infrastructure Engineer will manage Terraform and Kubernetes across cloud platforms, ensuring scalable infrastructure. Responsibilities include multi-cloud deployments, observability, and creating reusable components.

Top Skills: AWSAzureCloudflareGCPKubernetesTerraform

The Walt Disney Company

Sr Principal Site Reliability Engineer

Reposted 4 Days AgoSaved

In-Office

San Francisco Bay Area, CA

251K-336K Annually

Senior level

251K-336K Annually

Senior level

Digital Media • Gaming • News + Entertainment • Sports

As a Sr Principal Site Reliability Engineer, you will ensure maximum platform availability, lead incident response processes, drive automation, and collaborate across teams to optimize system performance and operational efficiency.

Top Skills: Automation ToolsCloud TechnologiesContent Delivery NetworksMedia Streaming TechnologiesMonitoring Tools

C3 AI

Senior/Lead Site Reliability Engineer – Federal

Reposted 4 Days AgoSaved

In-Office

San Francisco Bay Area, CA

159K-230K Annually

Senior level

159K-230K Annually

Senior level

Artificial Intelligence • Big Data • Machine Learning • Software

The role involves designing and implementing custom installations of the C3 AI Platform for Federal customers, ensuring uptime, and automating system processes while collaborating with cross-functional teams.

Top Skills: AnsibleAWSAzureBashKubernetesLinuxPuppetPythonRubyTerraform

Zilliz

Senior Software Engineer, Cloud Reliability

Reposted 4 Days AgoSaved

Hybrid

San Francisco Bay Area, CA

175K-225K Annually

Senior level

175K-225K Annually

Senior level

Artificial Intelligence • Machine Learning • Database

The role involves ensuring the reliability and performance of distributed database systems, developing monitoring strategies, and automating operations in a cloud-native environment.

Top Skills: AnsibleArgoAWSAzureDockerGCPGitlab CiGoJavaJenkinsKubernetesPythonTerraform

Optura

Sr. Site Reliability Engineer

5 Days AgoSaved

In-Office

San Francisco Bay Area, CA

Senior level

Artificial Intelligence • Healthtech • Software • Automation

Design, build, and operate Optura's multi-cloud, HIPAA-aware platform: run Kubernetes across cloud and customer on-prem/air-gapped environments, create unified deployment tooling (Helm/operators/GitOps), own SLOs/capacity/incident response, drive reliability, implement identity/networking/security controls, and build IaC/GitOps patterns in partnership with product and security teams.

Top Skills: AksArgo CdAWSAzureBackstageCluster ApiCrossplaneDistributed TracingEksGCPGitopsGkeGoGrafanaHelmKmsKubernetesMtlsOidcOpenshiftOpentelemetryOperatorsPrometheusPulumiPythonRancherReplicatedSecrets ManagementService MeshTalosTerraformVpc

MongoDB

Site Reliability Engineer (Senior or Staff), Storage Layer Services (SLS)

Reposted 23 Days AgoSaved

Easy Apply

Remote or Hybrid

San Francisco Bay Area, CA

Easy Apply

126K-248K Annually

Senior level

126K-248K Annually

Senior level

Big Data • Cloud • Software • Database

The Senior Site Reliability Engineer will develop and support distributed storage services, ensuring reliability and operational safety, with a focus on automation and efficiency.

Top Skills: AWSAzureDnsGoGoogle Cloud PlatformKubernetesLinuxPythonTcp/IpTls

MongoDB

Staff Site Reliability Engineer, Fabric

Reposted 23 Days AgoSaved

Easy Apply

Remote or Hybrid

San Francisco Bay Area, CA

Easy Apply

127K-249K Annually

Expert/Leader

127K-249K Annually

Expert/Leader

Big Data • Cloud • Software • Database

Seeking a Site Reliability Engineer with expertise in networking and distributed systems for building secure multi-cloud infrastructure. Responsibilities include maintaining network architecture and ensuring reliable service-to-service communication, involving a 24/7 on-call rotation.

Top Skills: AWSAzureBgpDnsGCPIpv6KubernetesLoad BalancingMtlsService MeshTcp/IpTlsVpcsVpns

OpenAI

Software Engineer, Reliability

Reposted 5 Days AgoSaved

In-Office

San Francisco Bay Area, CA

230K-490K Annually

Mid level

230K-490K Annually

Mid level

Artificial Intelligence • Machine Learning • Generative AI

The Software Engineer in Reliability will ensure system scalability, reliability, and performance, collaborating with teams to improve infrastructure and handle incidents.

Top Skills: Cloud InfrastructureCloudFormationContainer Orchestration PlatformsContainerization TechnologiesDatadogGrafanaIac ToolsKubernetesMicroservices ArchitectureObservability ToolsProgramming LanguagesPrometheusService Mesh TechnologiesSplunkTerraform

Latent

Site Reliability Engineer

Reposted 5 Days AgoSaved

In-Office

San Francisco Bay Area, CA

200K-275K Annually

Senior level

200K-275K Annually

Senior level

Artificial Intelligence • Healthtech • Information Technology • Software

As a Site Reliability Engineer, you will manage the production environment, focusing on infrastructure design, automation, and optimizing deployment pipelines to ensure high availability.

Top Skills: HelmKafkaKubernetesPostgresPythonRedisTerraformTypescript