Maximum of 25 job preferences reached.
Top Senior Site Reliability Engineer Jobs in San Francisco, CA
Reposted 6 Days AgoSaved
Cloud • Software
Responsible for maintaining FedRAMP-compliant infrastructure, collaborating with software engineers, and ensuring system availability and security. Duties include infrastructure design, automation, monitoring, and incident response.
Top Skills:
AWSGoKubernetesPuppetPythonTerraform
Software • Cryptocurrency
Manage and scale Kubernetes clusters, automate infrastructure, optimize performance, maintain blockchain nodes, and improve system reliability while collaborating with product teams.
Top Skills:
Aws (Ec2Aws EksDatadogDockerIam)KubernetesOpentelemetryPulumiRdsS3Terraform
Healthtech • Software
The SRE Technical Project Manager will lead project delivery, incident management, automation processes, and uptime communication, partnering with SRE and development teams to ensure system stability and scalability.
Top Skills:
Ai BotsDatadogJIRAJira Service ManagementMs TeamsOpsgeniePagerduty
Digital Media • Gaming • News + Entertainment • Sports
As a Sr Principal Site Reliability Engineer, you will ensure maximum platform availability, lead incident response processes, drive automation, and collaborate across teams to optimize system performance and operational efficiency.
Top Skills:
Automation ToolsCloud TechnologiesContent Delivery NetworksMedia Streaming TechnologiesMonitoring Tools
Software
Join a fast-growing team as a Platform Engineer to enhance AI control systems, ensuring reliability and performance while collaborating on product decisions.
Top Skills:
AIDeveloper ToolsInfrastructure
Artificial Intelligence
The SRE/Infrastructure Engineer will manage Terraform and Kubernetes across cloud platforms, ensuring scalable infrastructure. Responsibilities include multi-cloud deployments, observability, and creating reusable components.
Top Skills:
AWSAzureCloudflareGCPKubernetesTerraform
Enterprise Web • Information Technology • Software
Join a passionate team as a Platform Engineer (SRE), focusing on improving reliability, performance, and availability of AI control plane products. Collaborate closely on operational processes and foster reliability culture across engineering.
Top Skills:
AIDeveloper ToolsInfrastructureMcpMonitoring SystemsObservabilitySecure AccessSecurity
Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
The Senior Site Reliability Engineer will architect and implement scalable cloud infrastructure, lead incident response, and ensure system reliability for product initiatives.
Top Skills:
AWSCloud InfrastructureGCPKubernetes
Software
As a Senior Site Reliability Engineer, you will ensure the reliability and scalability of production systems, improve system performance, and enhance observability through design and automation.
Top Skills:
AWSCloudwatchDatadogGrafanaPrometheusTerraform
Reposted 12 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will lead security design and implementation for cloud infrastructures, mentor teams, and automate security solutions.
Top Skills:
AnsibleAWSAzureCloud Security ToolsCloudFormationGCPGoTerraform
Artificial Intelligence • Big Data • Machine Learning • Software
The role involves designing and implementing custom installations of the C3 AI Platform for Federal customers, ensuring uptime, and automating system processes while collaborating with cross-functional teams.
Top Skills:
AnsibleAWSAzureBashKubernetesLinuxPuppetPythonRubyTerraform
Cloud • Information Technology
As a Staff Site Reliability Engineer, you will enhance cloud product lines, ensuring real-time scalability, collaborating with teams, and automating builds.
Top Skills:
AnsibleAWSAzureBashDnsDockerEnvoyGCPGitGoGrafanaHaproxyHTTPJenkinsKafkaKubernetesLinuxMySQLOciOpentelemetryPostgresPrometheusPuppetPythonRedisTcp/IpTelegrafTerraformTls
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Cloud • Security • Cybersecurity
As a Junior Site Reliability Engineer, you will support cloud operations, implement automation for cloud infrastructure, and ensure system reliability and security.
Top Skills:
AnsibleAWSAzureBashElastic StackGCPJIRAPowershellPythonServicenowSplunkTerraform
Software
As a Site Reliability Engineer, you will enhance system reliability, manage cloud services, respond to incidents, and support network systems.
Top Skills:
AutomationCisco RoutingCloud ServicesF5 Load BalancingFortinet FirewallsInfrastructure AutomationMonitoringNetworking
Artificial Intelligence • Cloud • Information Technology • Software
As a Staff SRE, you will ensure the reliability and performance of Andromeda's GPU infrastructure, lead incident responses, build observability systems, and mentor engineers, while collaborating closely with engineering and customers.
Top Skills:
AnsibleCudaGoHelmKubernetesLinuxNcclNvidiaPythonRustSlurmTerraform
Artificial Intelligence • Information Technology • Machine Learning • Natural Language Processing • Productivity • Software • Generative AI
The SRE will ensure the reliability of backend systems, scale Kubernetes-based control planes, and improve automation mechanisms while managing incident processes.
Top Skills:
AWSAzureDockerGCPJavaKubernetesLinuxTerraform
Fintech • Professional Services • Software
As a Senior Site Reliability Engineer, you'll design scalable systems on AWS, mentor engineers, manage incident responses, and enhance the reliability of fintech infrastructure.
Top Skills:
SparkAWSDevOpsJavaKubernetesTerraform
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
The Senior Site Reliability Engineer will manage system incidents, improve monitoring and logging, optimize database infrastructure, and collaborate on scaling systems efficiently.
Top Skills:
AWSClickhouseKubernetesMySQLPostgresRedis
Artificial Intelligence • Healthtech • Information Technology • Software
As a Site Reliability Engineer, you will manage the production environment, focusing on infrastructure design, automation, and optimizing deployment pipelines to ensure high availability.
Top Skills:
HelmKafkaKubernetesPostgresPythonRedisTerraformTypescript
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Generative AI
The Site Reliability Engineer will develop, deploy, and operate AI infrastructure, focusing on high-performance and scalable machine learning systems using Kubernetes and cloud platforms.
Top Skills:
AWSAzureC++GCPGoKubernetesOci
Healthtech • Information Technology • Telehealth
Lead Site Reliability Engineer responsible for ensuring cloud services reliability, automation, and performance while mentoring a team and collaborating cross-functionally. Drive initiatives to enhance incident management and enforce security compliance.
Top Skills:
AnsibleAWSAws CloudformationAzureBashDatadogDockerElk StackGoGCPGrafanaKubernetesPrometheusPuppetPythonTerraform
Fintech • Real Estate
The Senior Site Reliability Engineer executes reliability strategies, designs and maintains infrastructure, improves monitoring and deployment processes, collaborates with teams for system reliability and performance optimization.
Top Skills:
Automated Configuration ManagementAutomated ProvisioningAWSAzureAzure StorageCloud-Based SolutionsContainerization SolutionsGCPGitJIRALinuxMariadbMySQLRdsSQL ServerUnixWindows
Software
The Site Reliability Engineer will enhance reliability, observability, and incident response of You.com's production services, while collaborating with teams to implement best practices and improve operational efficiency through tooling and automation.
Top Skills:
AWSBashCi/CdEksGhaGitGitGrafanaOpentelemetryPrometheusPythonTerraform
Other
The Senior Site Reliability Engineer at Juul Labs ensures operational stability and performance of hybrid cloud infrastructure, leads automation, and handles critical incidents.
Top Skills:
AWSBashCloudFormationGCPNutanixPowershellPythonTerraform
Artificial Intelligence • Machine Learning • Biotech • Generative AI
The Site Reliability Engineer will manage digital infrastructure, ensuring access to compute resources, automating processes, and maintaining resource visibility for researchers.
Top Skills:
AnsibleDockerGrafanaKubernetesPrometheusPythonTailscaleTalos Linux
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top San Francisco Companies Hiring Senior Site Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results


















.png)













