Get the job you really want.

Top Senior Site Reliability Engineer Jobs in San Francisco, CA

Reposted 3 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
Senior level
Senior level
Software
The role involves managing compute infrastructure for decentralized applications, requiring critical thinking, documentation skills, and experience in Kubernetes and blockchain management.
Top Skills: BlockchainGitopsInfrastructure-As-CodeKubernetesProgramming Languages
Reposted 13 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
120K-140K Annually
Junior
120K-140K Annually
Junior
Artificial Intelligence • Security • Software
You will develop and improve cloud infrastructure, support distributed systems, and write infrastructure-as-code while collaborating across teams.
Top Skills: AWSCloudFormationDockerGoJavaKubernetesPythonTerraform
Reposted 4 Days AgoSaved
Easy Apply
Remote
San Francisco Bay Area, CA
Easy Apply
Senior level
Senior level
Artificial Intelligence • eCommerce • Retail
Lead the SRE and DevOps team, ensure infrastructure reliability, oversee cloud operations, drive automation, and collaborate cross-functionally.
Top Skills: AzureBashCi/CdDatadogDockerElk StackGoGrafanaKubernetesPowershellPrometheusPythonTerraform
Reposted 4 Days AgoSaved
Easy Apply
Remote
San Francisco Bay Area, CA
Easy Apply
172K-215K Annually
Senior level
172K-215K Annually
Senior level
Aerospace • Big Data • Greentech • Hardware • Social Impact
Design, deploy, and operate compute services for on-premises and cloud satellite imaging platforms. Build reproducible, scalable, highly available deployments, troubleshoot distributed systems, optimize constrained environments, document and automate operations, and participate in on-call rotations to ensure reliability for customer-facing and air-gapped deployments.
Top Skills: AlloyAnsibleBashCudaGitopsGrafanaHelmJIRAK3SKubernetesKustomizeOpentelemetryPrometheusProxmoxPythonRke2TalosTerraform
Reposted 4 Days AgoSaved
Easy Apply
Remote
San Francisco Bay Area, CA
Easy Apply
150K-185K Annually
Mid level
150K-185K Annually
Mid level
Software
Join the SRE team to improve monitoring, alerting, observability, and reliability of Fireblocks' production systems. Triage incidents, run RCA, create runbooks and automation (Python, Lambda, shell, Ansible, ArgoCD), collaborate with R&D/support, and participate in on-call rotation.
Top Skills: AnsibleArgocdAWSAws LambdaAzureBashBitbucketC++ChefCoralogixDatadogDockerGerritGitGitlabGCPHelmJavaScriptKubernetesLinuxMySQLNew RelicNginxNode.jsPhabricatorPrometheusPuppetPythonShellSplunk
Reposted 4 Days AgoSaved
Remote
San Francisco Bay Area, CA
110K-130K Annually
Senior level
110K-130K Annually
Senior level
Real Estate • Financial Services • PropTech
As a Site Reliability Engineer, you will support AWS Cloud products, optimize processes, enhance automation, and ensure system reliability and performance.
Top Skills: ArgocdAWSAzure DevopsBashCi/CdCloudwatchDockerEksFluxcdGitKubernetesPowershellPythonSQLTerraform
Reposted 14 Days AgoSaved
In-Office
San Francisco Bay Area, CA
159K-230K Annually
Senior level
159K-230K Annually
Senior level
Artificial Intelligence • Big Data • Machine Learning • Software
The role involves designing and implementing custom installations of the C3 AI Platform for Federal customers, ensuring uptime, and automating system processes while collaborating with cross-functional teams.
Top Skills: AnsibleAWSAzureBashKubernetesLinuxPuppetPythonRubyTerraform
Reposted 14 Days AgoSaved
Easy Apply
In-Office
San Francisco Bay Area, CA
Easy Apply
Expert/Leader
Expert/Leader
Cloud • Software • Analytics
The Principal Cloud Site Reliability Engineer will lead the design and implementation of cloud infrastructure, manage CI/CD pipelines, mentor teams, and ensure secure, performant systems in AWS and Azure environments.
Top Skills: AnsibleAWSAzureBashChefDockerElkGrafanaJenkinsKubernetesMongoDBMySQLPostgresPrometheusPuppetPythonRdsSaltTerraform
5 Days AgoSaved
Easy Apply
Remote
San Francisco Bay Area, CA
Easy Apply
110K-175K Annually
Senior level
110K-175K Annually
Senior level
Cloud • Software
In this role, you'll support large-scale applications, improve observability, mentor team members, and ensure reliability by collaborating on deployments and writing automation scripts while providing 24/7 support.
Top Skills: AnsibleAWSBashConfluenceDockerElk StackGCPGitlab CicdGrafanaJenkinsJIRAKubernetesLinuxMongoDBMySQLNagiosOciPerlPostgresPrometheusPuppetPythonTerraform
Reposted 5 Days AgoSaved
Easy Apply
Remote
San Francisco Bay Area, CA
Easy Apply
170K-200K Annually
Senior level
170K-200K Annually
Senior level
Software
Lead SRE to define SRE strategy, architecture, and roadmap; design and operate containerized, compliant cloud environments; build observability, incident management, automation, and developer platform capabilities; mentor SRE team and collaborate with security, compliance, and product teams to ensure reliability at scale.
Top Skills: AWSAws MarketplaceAzureAzure MarketplaceGCPGoogle Cloud MarketplaceGrafanaKubernetesPrometheusTerraform
Reposted 6 Days AgoSaved
Remote
San Francisco Bay Area, CA
Junior
Junior
Computer Vision • Information Technology • Machine Learning • Natural Language Processing • Real Estate • Software
The SRE will maintain infrastructure for SaaS products on AWS, support developers, manage platform components, and handle IT tasks.
Top Skills: AWSComputer VisionIacLarge Language ModelsNlpTerraform
Reposted 16 Days AgoSaved
Easy Apply
In-Office
San Francisco Bay Area, CA
Easy Apply
150K-200K Annually
Junior
150K-200K Annually
Junior
Artificial Intelligence • Information Technology
As a Site Reliability Engineer, maintain user-facing services, implement best practices for reliability, and manage production incidents.
Top Skills: AnsibleCloud ServicesKubernetesProgramming LanguagesTerraform
New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free
Application Tracker Preview
Reposted 16 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
Senior level
Senior level
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Generative AI
The Site Reliability Engineer will develop, deploy, and operate AI infrastructure, focusing on high-performance and scalable machine learning systems using Kubernetes and cloud platforms.
Top Skills: AWSAzureC++GCPGoKubernetesOci
Reposted 16 Days AgoSaved
In-Office
San Francisco Bay Area, CA
200K-275K Annually
Senior level
200K-275K Annually
Senior level
Artificial Intelligence • Healthtech • Information Technology • Software
As a Site Reliability Engineer, you will manage the production environment, focusing on infrastructure design, automation, and optimizing deployment pipelines to ensure high availability.
Top Skills: HelmKafkaKubernetesPostgresPythonRedisTerraformTypescript
Reposted 16 Days AgoSaved
In-Office
San Francisco Bay Area, CA
Mid level
Mid level
Information Technology • Software
As a DevOps Engineer, you'll design and scale secure systems, manage AWS environments, automate operations, and ensure operational excellence for revenue teams.
Top Skills: Amazon AuroraAWSDockerDynamoDBGithub ActionsKafkaS3SnowflakeSparkSqsTerraform
Reposted 16 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
205K-235K Annually
Senior level
205K-235K Annually
Senior level
Financial Services
The Senior Cluster Site Reliability Engineer will enhance the research compute cluster's uptime, reliability, and performance through engineering and operational improvements, ensuring high availability for researchers working on machine learning problems.
Top Skills: AnsibleAWSAWSCephDockerElkGCPGCPGrafanaHorovodHpcInfinibandKubeflowKueueLokiLustreMlflowOpentelemetryPodmanPrometheusPythonRdmaRubyS3SingularitySlurmTerraform
Reposted 7 Days AgoSaved
Remote
San Francisco Bay Area, CA
250K-300K Annually
Senior level
250K-300K Annually
Senior level
Artificial Intelligence • Information Technology • Machine Learning • Software • Cybersecurity • Generative AI • Data Privacy
Lead global SRE and infrastructure teams to ensure reliability, scalability, and cost-efficiency of production and developer platforms. Define cloud and Kubernetes architecture, IaC, CI/CD, SLOs/SLIs, incident management, and cloud cost optimization while partnering with Security, Product, Finance, and Engineering.
Top Skills: AIAutomationAWSCi/CdCloud-Native SystemsGCPInfrastructure As CodeKubernetesTerraform
8 Days AgoSaved
Remote
San Francisco Bay Area, CA
156K-288K Annually
Mid level
156K-288K Annually
Mid level
Computer Vision • Machine Learning • Software
As a Site Reliability Engineer, ensure the reliability, performance, and scalability of Ditto's cloud infrastructure by developing observability solutions, leading incident management, and collaborating with product engineering teams.
Top Skills: AWSAzureCDatadogGCPGoGrafanaHelmJavaKubernetesPrometheusRustTerraform
17 Days AgoSaved
Remote or Hybrid
San Francisco Bay Area, CA
160K-260K Annually
Senior level
160K-260K Annually
Senior level
Software
As a Site Reliability Engineer, you will manage the reliability and scalability of platform infrastructure, build observability tools, and automate processes to enhance operational excellence.
Top Skills: AWSGCPGoKubernetesPulumiPythonTerraform
Reposted 17 Days AgoSaved
In-Office
San Francisco Bay Area, CA
172K-258K Annually
Expert/Leader
172K-258K Annually
Expert/Leader
Fintech
The Principal Site Reliability Engineer designs and implements software to enhance application performance and resilience while ensuring security standards. Responsibilities include automating application management, providing observability, and leading cross-functional teams. Mentorship and on-call rotation participation are expected.
Top Skills: AuroraAWSChefDockerDynamo DbGitGoJavaJenkinsJmsKafkaKubernetesMavenMemcachedOraclePythonRedisSqsSwarm
8 Days AgoSaved
Easy Apply
Remote
San Francisco Bay Area, CA
Easy Apply
230K-250K Annually
Expert/Leader
230K-250K Annually
Expert/Leader
Artificial Intelligence • Healthtech • Software
The Staff Site Reliability Engineer will lead the reliability of production systems by defining SRE practices, improving observability, and ensuring fault-tolerance in cloud environments.
Top Skills: AWSGoKubernetesPostgresPythonTerraformTypescript
Reposted 8 Days AgoSaved
Remote
San Francisco Bay Area, CA
Senior level
Senior level
Digital Media • Social Media • Software • Sports
Lead the technical architecture and execution of migration to AWS, drive developer enablement, and automate infrastructure using code-first principles.
Top Skills: Aws EksDatadogGithub ActionsGoIstioK6KubernetesNode.jsTerraform
Reposted 8 Days AgoSaved
Remote
San Francisco Bay Area, CA
175K-275K Annually
Mid level
175K-275K Annually
Mid level
Software
As a Site Reliability Engineer, you'll enhance system reliability, collaborate on production readiness, define SLIs/SLOs, and improve incident response.
Top Skills: AWSDatadogGrafanaKubernetesOpentelemetryPrometheusTypescript
9 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
107K-221K Annually
Senior level
107K-221K Annually
Senior level
Cloud • Security • Software • Cybersecurity
The Senior Lead Site Reliability Engineer will ensure performance and uptime of security products, develop automation pipelines, and improve monitoring systems, working closely with various teams.
Top Skills: AzureDatabricksDockerGoJenkinsKubernetesPythonTerraform
Reposted 18 Days AgoSaved
In-Office
San Francisco Bay Area, CA
Senior level
Senior level
Software
Design, implement, and maintain scalable backend systems and APIs; build cloud infrastructure (preferably GCP) using Terraform; operate containerized workloads with Kubernetes; ensure reliability, security, and performance; participate in on-call rotations, architecture discussions, and cross-functional delivery.
Top Skills: Ci/CdCloud AutomationContainer OrchestrationGoGoogle Cloud PlatformIamInfrastructure As CodeKubernetesMicroservicesPythonService-Oriented ArchitectureTerraform
All Filters
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account