Job Title, Company or Keyword

Maximum of 25 job preferences reached.

Top Senior Site Reliability Engineer Jobs in San Francisco, CA

Superhuman

Site Reliability Engineer

Reposted 8 Days AgoSaved

Hybrid

San Francisco Bay Area, CA

214K-260K Annually

Senior level

214K-260K Annually

Senior level

Artificial Intelligence • Information Technology • Machine Learning • Natural Language Processing • Productivity • Software • Generative AI

The SRE will ensure the reliability of backend systems, scale Kubernetes-based control planes, and improve automation mechanisms while managing incident processes.

Top Skills: AWSAzureDockerGCPJavaKubernetesLinuxTerraform

Attain

Sr/Staff Site Reliability Engineer, Consumer Apps

Reposted 10 Days AgoSaved

Easy Apply

In-Office

San Francisco Bay Area, CA

Easy Apply

Mid level

AdTech

As a Site Reliability Engineer, you'll maintain the infrastructure for systems, ensure efficiency, automate processes, monitor databases, and participate in architecture discussions.

Top Skills: Amazon KinesisAws LambdaAws SnsBigQueryDockerGcp (Google Cloud Platform)GitlabGoogle Cloud FunctionsGoogle Cloud RunGoogle Pub/SubGrafanaIstioKafkaKubernetesMySQLPrometheusSpannerSQLTerraform

MongoDB

Site Reliability Engineer (Senior or Staff), Infrastructure Security

Reposted 10 Days AgoSaved

Easy Apply

Remote or Hybrid

San Francisco Bay Area, CA

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

The Senior Site Reliability Engineer will lead security design and implementation for cloud infrastructures, mentor teams, and automate security solutions.

Top Skills: AnsibleAWSAzureCloud Security ToolsCloudFormationGCPGoTerraform

Deepgram

Site Reliability Engineer - AI & ML Infrastructure (Kubernetes, AWS & Terraform)

Reposted 2 Days AgoSaved

Remote

San Francisco Bay Area, CA

150K-220K Annually

Senior level

150K-220K Annually

Senior level

Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Conversational AI

The engineer will build and operate AI/ML infrastructure, managing services on AWS and bare metal, using tools like Kubernetes and Terraform.

Top Skills: AWSBashGoKubernetesPythonSlurmTerraform

Runpod

Site Reliability Engineer

Reposted 4 Days AgoSaved

Easy Apply

Remote

San Francisco Bay Area, CA

Easy Apply

150K-200K Annually

Senior level

150K-200K Annually

Senior level

Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)

As a Site Reliability Engineer, you will ensure system stability and resilience, define reliability standards, and automate operational processes while collaborating cross-functionally to improve performance and reduce incidents.

Top Skills: BashCi/CdDockerGoGrafanaKubernetesLinuxPrometheusPython

Dropbox

Staff Site Reliability Engineer, Production Engineering

Reposted 4 Days AgoSaved

Remote

San Francisco Bay Area, CA

223K-302K Annually

Expert/Leader

223K-302K Annually

Expert/Leader

Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy

The role involves defining reliability strategies, leading initiatives across teams, enhancing monitoring and incident response, and mentoring engineers at Dropbox.

Top Skills: Ai TechnologiesDebuggingDistributed SystemsIncident ResponseObservabilityReliability Risk ManagementSlasSlos

Domino Data Lab

Staff Site Reliability Engineer

5 Days AgoSaved

Easy Apply

Remote or Hybrid

San Francisco Bay Area, CA

Easy Apply

200K-230K Annually

Senior level

200K-230K Annually

Senior level

Artificial Intelligence • Machine Learning

Lead development of AI-assisted reliability tooling, own incident response end-to-end, improve observability and SLO/SLI frameworks, scale single-tenant SaaS operations, mentor engineers, and reduce recurring operational toil through engineering and automation.

Top Skills: Cloud PlatformsGoKubernetesLinuxLlm/Ai ToolingLogs And TracingObservability ToolingPythonSlo/Sli Frameworks

Zscaler

Site Reliability Engineer-SkillBridge Intern

Reposted 7 Days AgoSaved

Easy Apply

Remote or Hybrid

San Francisco Bay Area, CA

Easy Apply

Internship

Cloud • Information Technology • Security • Software • Cybersecurity

This internship role focuses on SRE skills, requiring collaboration and problem-solving in dynamic environments for Zscaler's Zero Trust Exchange team.

Top Skills: AnsibleAws EcsKubernetesLinuxPythonTerraform

GitLab

Site Reliability Engineer, Cloud Cost Utilization

Reposted 7 Days AgoSaved

Easy Apply

Remote

San Francisco Bay Area, CA

Easy Apply

Mid level

Cloud • Security • Software • Cybersecurity • Automation

As a Cloud Cost Utilization SRE at GitLab, you'll manage cloud spending, improve tracking and optimization of cloud usage, and collaborate with finance and engineering teams to enhance cost efficiency across AWS and GCP.

Top Skills: AnsibleAWSElkGCPGrafanaLokiMimirPrometheusTempoTerraform

Sprinter Health

Senior, Site Reliability Engineer (SRE)

Reposted 21 Days AgoSaved

Remote or Hybrid

San Francisco Bay Area, CA

160K-235K Annually

Senior level

160K-235K Annually

Senior level

Artificial Intelligence • Healthtech • Logistics • Social Impact • Software • Telehealth

The Senior Site Reliability Engineer will enhance the reliability and security of infrastructure for in-home healthcare services, using cloud technology and automation to improve systems and processes.

Top Skills: AWSBashGCPPythonTerraformTypescript

Coinbase

Staff Site Reliability Engineer, Core AI Infrastructure

12 Days AgoSaved

Easy Apply

Remote

San Francisco Bay Area, CA

Easy Apply

218K-257K Annually

Senior level

218K-257K Annually

Senior level

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

Own reliability, monitoring, and incident response for AI infrastructure; build automation and CI/CD tooling; manage Kubernetes/Docker production workloads; partner with infrastructure, security, and compliance; improve observability and documentation; develop internal full‑stack tooling in Go or Python.

Top Skills: AnsibleAWSBashChefCi/CdDockerEc2GitGoKubernetesLinuxLog AggregationNetwork SecurityPuppetPythonRubySaltTerraform

MongoDB

Site Reliability Engineer (Senior or Staff), Atlas

Reposted 14 Days AgoSaved

Easy Apply

Remote or Hybrid

San Francisco Bay Area, CA

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

As a Senior Site Reliability Engineer, you'll design and build complex systems, support Atlas platform operations, automate processes, and ensure high availability of services.

Top Skills: AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls

New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free

Sprinter Health

Staff, Site Reliability Engineer (SRE)

25 Days AgoSaved

Remote or Hybrid

San Francisco Bay Area, CA

160K-255K Annually

Senior level

160K-255K Annually

Senior level

Artificial Intelligence • Healthtech • Logistics • Social Impact • Software • Telehealth

The Staff Site Reliability Engineer at Sprinter Health will enhance the reliability and security of cloud infrastructure, automate processes, and improve system observability across healthcare delivery operations.

Top Skills: Access ManagementAWSBashCi/Cd SystemsCloud NetworkingContainer SystemsGCPIdentity ManagementLogging PlatformsMonitoring PlatformsObservability PlatformsPythonSecrets ManagementTerraformTypescript

Drata

Senior Site Reliability Engineer

Reposted 3 Days AgoSaved

Hybrid

San Francisco Bay Area, CA

167K-226K Annually

Senior level

167K-226K Annually

Senior level

Security • Software • Cybersecurity • Automation

As a Senior Site Reliability Engineer, you will enhance the reliability of Drata’s product teams through automation, architecture reviews, and operational excellence using cloud-native technologies.

Top Skills: AiopsAWSBashDatadogDockerGitGithub ActionsKubernetesLinuxMySQLPythonTerraform

Airwallex

Senior Site Reliability Engineer, Spend

Reposted 4 Days AgoSaved

Hybrid

San Francisco Bay Area, CA

160K-250K Annually

Senior level

160K-250K Annually

Senior level

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI

Lead design and delivery of scalable cloud infrastructure for the Spend product. Embed with development teams to drive reliability, performance, observability, incident response, and automation. Own SLOs, runbooks, DevOps metrics, and collaborate with central DevOps and security teams to ensure compliance and resilience. Lead infrastructure projects including new service launches, data centre migrations, and modernising data pipelines.

Top Skills: Analytics PipelinesAWSData StreamingDevOpsGCPIncident ResponseKubernetesObservabilitySlosSre

Plenful

Site Reliability Engineer

Reposted 21 Hours AgoSaved

In-Office

San Francisco Bay Area, CA

Senior level

Artificial Intelligence • Healthtech

The Site Reliability Engineer will enhance system reliability, define observability standards, respond to incidents, and collaborate with engineering teams on performance and compliance improvements.

Top Skills: AWSContainerized ServicesDistributed WorkflowsObservability ToolingPostgresServerless Compute

Blaxel

Site Reliability Engineer

Reposted YesterdaySaved

In-Office

San Francisco Bay Area, CA

175K-250K Annually

Mid level

175K-250K Annually

Mid level

Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)

The Site Reliability Engineer will ensure the reliability and performance of AI infrastructure, build core systems, handle incident response, and develop automation tools.

Top Skills: AWSDatadogElkGCPGithub ActionsGitlab CiGoGrafanaJenkinsKubernetesLinuxPrometheusPulumiPythonRustTerraform

Site Reliability Engineer II, tvScientific

3 Days AgoSaved

In-Office or Remote

San Francisco Bay Area, CA

114K-235K Annually

Mid level

114K-235K Annually

Mid level

Social Media

Operate, scale, and improve a cloud-native platform on AWS and Kubernetes. Manage GitOps deployments with ArgoCD and Helm, provision infra with Terraform/Terragrunt, build CI/CD automation, enhance observability, respond to incidents, reduce operational toil through scripting, and collaborate with security and application teams to improve reliability and platform guardrails.

Top Skills: ArgocdAWSBashContainersEksGithub ActionsGitopsHelmIamKubernetesLinuxPythonTerraformTerragrunt

Thinking Machines Lab

Site Reliability Engineer (SRE)

Reposted 3 Days AgoSaved

In-Office

San Francisco Bay Area, CA

350K-475K Annually

Mid level

350K-475K Annually

Mid level

Artificial Intelligence • Information Technology

The Site Reliability Engineer will drive reliability for the Tinker platform, focusing on incident response, monitoring, and ensuring system resilience while collaborating across teams.

Top Skills: Cloud InfrastructureKubernetes

Okta

Staff Site Reliability Engineer - Observability GCP

Reposted 4 Days AgoSaved

In-Office

San Francisco Bay Area, CA

194K-267K Annually

Senior level

194K-267K Annually

Senior level

Cloud

The role involves building and managing observability infrastructure in GCP, automating deployments, and optimizing data processes for high reliability.

Top Skills: GkeGoGCPGrafanaKubernetesOpentelemetryPythonRubySplunkTerraform

Order.co

Senior Site Reliability Engineer

Reposted 21 Hours AgoSaved

Remote or Hybrid

San Francisco Bay Area, CA

175K-200K Annually

Senior level

175K-200K Annually

Senior level

eCommerce • Fintech • Payments • Software

The role involves ensuring software reliability and performance, managing incidents, developing infrastructure automation, and mentoring junior engineers within a platform team.

Top Skills: AWSCloudFormationDatadogKubernetesOpentelemetryRubyRuby On RailsTerraform

E2B

SRE/Infrastructure Engineer

Reposted 4 Days AgoSaved

In-Office

San Francisco Bay Area, CA

200K-350K Annually

Senior level

200K-350K Annually

Senior level

Artificial Intelligence

The SRE/Infrastructure Engineer will manage Terraform and Kubernetes across cloud platforms, ensuring scalable infrastructure. Responsibilities include multi-cloud deployments, observability, and creating reusable components.

Top Skills: AWSAzureCloudflareGCPKubernetesTerraform

The Walt Disney Company

Sr Principal Site Reliability Engineer

Reposted 4 Days AgoSaved

In-Office

San Francisco Bay Area, CA

251K-336K Annually

Senior level

251K-336K Annually

Senior level

Digital Media • Gaming • News + Entertainment • Sports

As a Sr Principal Site Reliability Engineer, you will ensure maximum platform availability, lead incident response processes, drive automation, and collaborate across teams to optimize system performance and operational efficiency.

Top Skills: Automation ToolsCloud TechnologiesContent Delivery NetworksMedia Streaming TechnologiesMonitoring Tools

C3 AI

Senior/Lead Site Reliability Engineer – Federal

Reposted 4 Days AgoSaved

In-Office

San Francisco Bay Area, CA

159K-230K Annually

Senior level

159K-230K Annually

Senior level

Artificial Intelligence • Big Data • Machine Learning • Software

The role involves designing and implementing custom installations of the C3 AI Platform for Federal customers, ensuring uptime, and automating system processes while collaborating with cross-functional teams.

Top Skills: AnsibleAWSAzureBashKubernetesLinuxPuppetPythonRubyTerraform

MongoDB

Site Reliability Engineer (Senior or Staff), Storage Layer Services (SLS)

Reposted 23 Days AgoSaved

Easy Apply

Remote or Hybrid

San Francisco Bay Area, CA

Easy Apply

126K-248K Annually

Senior level

126K-248K Annually

Senior level

Big Data • Cloud • Software • Database

The Senior Site Reliability Engineer will develop and support distributed storage services, ensuring reliability and operational safety, with a focus on automation and efficiency.

Top Skills: AWSAzureDnsGoGoogle Cloud PlatformKubernetesLinuxPythonTcp/IpTls