Top Senior Site Reliability Engineer Jobs in San Francisco, CA

Reposted 11 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
Senior level
Senior level
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Generative AI
The Site Reliability Engineer will develop, deploy, and operate AI infrastructure, focusing on high-performance and scalable machine learning systems using Kubernetes and cloud platforms.
Top Skills: AWSAzureC++GCPGoKubernetesOci
12 Days AgoSaved
In-Office
San Francisco Bay Area, CA
Senior level
Senior level
Artificial Intelligence • Machine Learning • Software • Industrial
The Site Reliability Engineer will own backend services and cloud infrastructure, focusing on system reliability, scalability, and operational excellence for Archetype AI's platform.
Top Skills: AWSAzureC++CloudFormationCudaGCPKubernetesPulumiPythonPyTorchRustTerraform
12 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
Entry level
Entry level
Artificial Intelligence • Machine Learning • Biotech • Generative AI
The Site Reliability Engineer will manage digital infrastructure, ensuring access to compute resources, automating processes, and maintaining resource visibility for researchers.
Top Skills: AnsibleDockerGrafanaKubernetesPrometheusPythonTailscaleTalos Linux
Reposted 12 Days AgoSaved
In-Office
San Francisco Bay Area, CA
Senior level
Senior level
Software
Design, implement, and maintain scalable backend systems and APIs; build cloud infrastructure (preferably GCP) using Terraform; operate containerized workloads with Kubernetes; ensure reliability, security, and performance; participate in on-call rotations, architecture discussions, and cross-functional delivery.
Top Skills: Ci/CdCloud AutomationContainer OrchestrationGoGoogle Cloud PlatformIamInfrastructure As CodeKubernetesMicroservicesPythonService-Oriented ArchitectureTerraform
Reposted 3 Days AgoSaved
Remote
San Francisco Bay Area, CA
156K-288K Annually
Mid level
156K-288K Annually
Mid level
Computer Vision • Machine Learning • Software
As a Site Reliability Engineer, ensure the reliability, performance, and scalability of Ditto's cloud infrastructure by developing observability solutions, leading incident management, and collaborating with product engineering teams.
Top Skills: AWSAzureCDatadogGCPGoGrafanaHelmJavaKubernetesPrometheusRustTerraform
Reposted 3 Days AgoSaved
Remote
San Francisco Bay Area, CA
Senior level
Senior level
Digital Media • Social Media • Software • Sports
Lead the technical architecture and execution of migration to AWS, drive developer enablement, and automate infrastructure using code-first principles.
Top Skills: Aws EksDatadogGithub ActionsGoIstioK6KubernetesNode.jsTerraform
Reposted 3 Days AgoSaved
Remote
San Francisco Bay Area, CA
175K-275K Annually
Mid level
175K-275K Annually
Mid level
Software
As a Site Reliability Engineer, you'll enhance system reliability, collaborate on production readiness, define SLIs/SLOs, and improve incident response.
Top Skills: AWSDatadogGrafanaKubernetesOpentelemetryPrometheusTypescript
Reposted 3 Days AgoSaved
Remote
San Francisco Bay Area, CA
Senior level
Senior level
Edtech
The Lead Software Engineer will lead the SRE team, focusing on reliability, performance optimization, security, and mentoring developers, while improving overall platform resilience.
Top Skills: ActivejobAnsibleAWSAws CloudwatchEc2EcsElasticsearchGitGCPGoogle Cloud StackdriverJenkinsJIRAKubernetesMemcachedMongoDBNew RelicNode.jsPostgresRedisRuby On RailsSidekiqSpinnakerTerraformTerragrunt
13 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
170K-230K Annually
Mid level
170K-230K Annually
Mid level
Artificial Intelligence • Cloud • Information Technology • Software
Contribute to the reliability and performance of Mithril's GPU orchestration platform through automation, observability, and infrastructure management. Collaborate with the team to ensure scalability across multi-cloud environments while maintaining systems stability and implementing SLOs.
Top Skills: AWSAzureGCPGoGrafanaKubernetesLinuxOpentelemetryPrometheusPulumiPythonTcp/IpTerraform
Reposted 13 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
170K-240K Annually
Senior level
170K-240K Annually
Senior level
Artificial Intelligence • Information Technology • Software
The Site Reliability Engineer will ensure high availability and performance of CodeRabbit's AI-powered code review platform, enhancing system reliability through infrastructure ownership, performance engineering, and automation.
Top Skills: AWSDatadogDockerElk StackGoogle Cloud PlatformGrafanaKubernetesLinuxNode.jsPrometheusTerraformTypescript
Reposted 9 Days AgoSaved
Easy Apply
Remote
San Francisco Bay Area, CA
Easy Apply
130K-140K Annually
Senior level
130K-140K Annually
Senior level
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
The Senior Site Reliability Engineer will manage system incidents, improve monitoring and logging, optimize database infrastructure, and collaborate on scaling systems efficiently.
Top Skills: AWSClickhouseKubernetesMySQLPostgresRedis
Reposted 4 Days AgoSaved
Remote
San Francisco Bay Area, CA
Senior level
Senior level
Healthtech
Develop and implement processes to ensure high availability and reliability of services. Responsibilities include incident management, automation, capacity planning, and risk mitigation.
Top Skills: AWSAzureDatadogDockerGrafanaJavaScriptNew RelicPrometheusPythonRubySplunkTerraform
New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free
Application Tracker Preview
Reposted 4 Days AgoSaved
Remote
San Francisco Bay Area, CA
190K-215K Annually
Senior level
190K-215K Annually
Senior level
Internet of Things • Cybersecurity
The Site Reliability Engineer will manage AWS GovCloud infrastructure, ensuring compliance and high availability while driving automation, security, and incident response best practices.
Top Skills: AnsibleAws GovcloudBashDockerElk StackGitlab Ci/CdGrafanaJenkinsKubernetesPrometheusPythonTerraform
Reposted 14 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
30K-120K Annually
Senior level
30K-120K Annually
Senior level
Information Technology • Automation
The SRE/Infrastructure Engineer will architect and manage secure, scalable systems for automated penetration testing, optimizing reliability, and enhancing infrastructure based on customer demand. Responsibilities include maintaining production environments, leading technical discussions, and promoting high coding standards.
Top Skills: AWSAzureCloudFormationElkGCPNew RelicOpentelemetryPostgresPrometheusTerraform
Reposted 15 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
175K-320K Annually
Mid level
175K-320K Annually
Mid level
Artificial Intelligence • Software
The SRE at Fluidstack is responsible for ensuring infrastructure reliability and performance, handling complex production issues, and improving platform stability.
Top Skills: AnsibleBashGoKubernetesPythonSlurmTerraform
Reposted 15 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
200K-260K Annually
Senior level
200K-260K Annually
Senior level
Artificial Intelligence • Software • Generative AI
The Lead Site Reliability Engineer will drive technical strategy, ensure high service availability, manage cloud infrastructure, and lead a team to optimize systems and automate processes.
Top Skills: AWSAzureDockerGoogle Cloud PlatformKubernetesTerraform
7 Days AgoSaved
Remote
San Francisco Bay Area, CA
100K-110K Annually
Mid level
100K-110K Annually
Mid level
Healthtech • Software
The SRE Technical Project Manager will lead project delivery, incident management, automation processes, and uptime communication, partnering with SRE and development teams to ensure system stability and scalability.
Top Skills: Ai BotsDatadogJIRAJira Service ManagementMs TeamsOpsgeniePagerduty
13 Days AgoSaved
Easy Apply
Remote
San Francisco Bay Area, CA
Easy Apply
186K-219K Annually
Senior level
186K-219K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The role involves leading AI product development, enhancing CI/CD frameworks, automating IT workflows, supporting AWS services, and driving cloud security best practices.
Top Skills: AnsibleAWSBashChefCi/CdDockerGitKubernetesPuppetPythonRubySaltTerraform
Reposted 7 Days AgoSaved
Remote
San Francisco Bay Area, CA
200K-250K Annually
Senior level
200K-250K Annually
Senior level
Software • Cryptocurrency
Manage and scale Kubernetes clusters, automate infrastructure, optimize performance, maintain blockchain nodes, and improve system reliability while collaborating with product teams.
Top Skills: Aws (Ec2Aws EksDatadogDockerIam)KubernetesOpentelemetryPulumiRdsS3Terraform
Reposted 7 Days AgoSaved
Remote
San Francisco Bay Area, CA
165K-200K Annually
Senior level
165K-200K Annually
Senior level
Cloud • Information Technology
As a Staff Site Reliability Engineer, you will enhance cloud product lines, ensuring real-time scalability, collaborating with teams, and automating builds.
Top Skills: AnsibleAWSAzureBashDnsDockerEnvoyGCPGitGoGrafanaHaproxyHTTPJenkinsKafkaKubernetesLinuxMySQLOciOpentelemetryPostgresPrometheusPuppetPythonRedisTcp/IpTelegrafTerraformTls
Reposted 7 Days AgoSaved
Remote
San Francisco Bay Area, CA
Senior level
Senior level
Software
As a Site Reliability Engineer, you will enhance system reliability, manage cloud services, respond to incidents, and support network systems.
Top Skills: AutomationCisco RoutingCloud ServicesF5 Load BalancingFortinet FirewallsInfrastructure AutomationMonitoringNetworking
Reposted 22 Days AgoSaved
Easy Apply
Remote or Hybrid
San Francisco Bay Area, CA
Easy Apply
127K-249K Annually
Senior level
127K-249K Annually
Senior level
Big Data • Cloud • Software • Database
Develop and maintain Kubernetes runtime environments, support developers, resolve critical issues, and participate in on-call rotations for production systems.
Top Skills: AWSAzureCert-ManagerCorednsCrdsCriCsiGatekeeperGCPGoHelmKubernetesKustomizeOperatorsPythonTerraform
8 Days AgoSaved
Remote
San Francisco Bay Area, CA
95K-110K Annually
Junior
95K-110K Annually
Junior
Cloud • Security • Cybersecurity
As a Junior Site Reliability Engineer, you will support cloud operations, implement automation for cloud infrastructure, and ensure system reliability and security.
Top Skills: AnsibleAWSAzureBashElastic StackGCPJIRAPowershellPythonServicenowSplunkTerraform
Reposted 17 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
250K-295K Annually
Senior level
250K-295K Annually
Senior level
Artificial Intelligence • Software
As a Senior Staff SRE Tech Lead, you'll oversee reliability and scalability, mentor engineers, optimize systems, and enhance data infrastructure.
Top Skills: ClickhouseGoPostgresPythonTypescript
Reposted 17 Days AgoSaved
In-Office
San Francisco Bay Area, CA
116K-200K Annually
Mid level
116K-200K Annually
Mid level
Information Technology • Mobile • Software
As a Site Reliability Engineer, you'll ensure system reliability and scalability, automate processes, optimize performance, and collaborate on system design.
Top Skills: AWSAzureBashCloudFormationDatadogDockerElkGoGoogle Cloud PlatformGrafanaHelmKubernetesNew RelicPrometheusPulumiPythonTerraform
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account