Get the job you really want.
Maximum of 25 job preferences reached.
Top Senior Site Reliability Engineer Jobs in San Francisco, CA
Blockchain • Software
As a Site Reliability Engineer at Offchain Labs, you will manage infrastructure in cloud environments, design CI/CD workflows, and enhance system reliability with a focus on blockchain technology.
Top Skills:
ArgocdAWSAzureCodebuildGCPGithub ActionsGoGrafanaKubernetesLokiPrometheusPythonTerraform
Artificial Intelligence • Logistics • Software
The Site Reliability Engineer will enhance operational resilience, ensuring system stability, observability, and debugging workflows for complex failures while improving developer focus and uptime.
Top Skills:
DatadogGoPrometheusPythonSentry
Cloud • Information Technology
The Site Reliability Engineer will support IaaS services, monitor infrastructure health, perform root cause analysis, automate processes, and collaborate with teams for service reliability.
Top Skills:
AnsibleAWSAzureBashGitlab CiJenkinsKubernetesLinuxOpenshiftPythonTerraformVmware Vsphere
Fintech
As a Site Reliability Engineer, you will enhance system reliability through scalable infrastructure, observability practices, automation, and collaboration with engineering teams.
Top Skills:
AWSDatadogGoGrafanaJavaKubernetesNode.jsPrometheusPulumiPythonTerraform
2 Days AgoSaved
Easy Apply
Easy Apply
Analytics
The Site Reliability Engineer will ensure the reliability and performance of IaaS services, perform incident resolution, and enhance system reliability through automation while supporting mobility across hybrid infrastructures and collaborating extensively with various teams.
Top Skills:
AnsibleAWSAzureBashGitlab CiJenkinsKubernetesLinuxOpenshiftPythonTerraformVmware Vsphere
Energy
The Site Reliability Engineer will design and implement systems, drive automation, coordinate between teams, support deployed systems, and ensure scalability for rapid growth.
Top Skills:
Active DirectoryAnsibleAWSAzureChefJSONLinuxPuppetPythonRestVMwareWindows ServerYaml
Software
As an AI Support Engineer, you'll manage support requests, resolve user issues, optimize ML models, and contribute to product development.
Top Skills:
Tensorrt
Artificial Intelligence • Legal Tech • Professional Services • Software
As a Staff Software Engineer in Site Reliability, you'll manage infrastructure for reliability and scalability, lead incident management, and automate operational tasks.
Top Skills:
AWSAzureBashCloudFormationDatadogGCPGoIncidentioPagerdutyPulumiPythonSentryTerraform
Artificial Intelligence • Legal Tech • Professional Services • Software
As a Software Engineer in Site Reliability, you will ensure the reliability and performance of our AI platform through automation and strategic infrastructure management.
Top Skills:
AWSAzureBashCloudFormationDatadogGCPGoKubernetesPagerdutyPythonSentryTerraform
Consumer Web • eCommerce • Fashion • Retail
Seeking a Staff Software Engineer for the SRE team to enhance CI/CD systems, optimize infrastructure, and improve developer productivity. Responsibilities include architecting solutions, mentoring engineers, and driving technical initiatives to elevate operational excellence.
Top Skills:
AnsibleBashGithub ActionsGoHelmJenkinsKubernetesPythonRubySpinnakerTerraform
Artificial Intelligence • Software
As a Site Reliability Engineer at Mercor, you will ensure production reliability, develop SRE function, and collaborate with engineering teams to maintain system performance.
Top Skills:
AWSKubernetesSpaceliftTerraform
Cloud • Software
The Site Reliability Engineer (SRE) will manage reliable, scalable systems, focusing on software development, infrastructure automation, and incident response. Responsibilities include monitoring, CI/CD pipeline management, security compliance, and cost optimization while collaborating with various teams.
Top Skills:
AWSAzureDockerElk StackGCPGitGrafanaJavaKubernetesPHPPrometheusPythonShellTerraform
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Information Technology
As a Site Reliability Engineer at New Era Technology, you'll focus on ensuring operational efficiency, creating reliable systems, and enhancing service performance through AWS expertise.
Top Skills:
AWS
Information Technology • Marketing Tech • Social Media
Lead platform engineering teams to design a secure and scalable hosting platform, integrating AI and automation while collaborating across departments.
Top Skills:
AIAnsibleAutomationAWSDockerGrafanaMlNginxNode.jsOpenstackPHPPrometheusTerraformWordpress
Artificial Intelligence • Fintech • Machine Learning • Natural Language Processing • Business Intelligence
The Senior Director of SRE leads and defines reliability and operational excellence across products, manages the SRE team, and scales reliability practices within the organization.
Top Skills:
AWSAzureCloud-Native NetworkingDistributed SystemsGCPKubernetesMicroservicesSite Reliability Engineering Principles
Cloud • Security • Software • Cybersecurity
The Principal Site Reliability Engineer will lead Veeam's global SRE efforts, focusing on architecture, reliability strategies, and mentorship while influencing cross-functional teams.
Top Skills:
Automation ToolingCloud InfrastructureCloud-Native DevelopmentDistributed Systems
Payments
As a Principal Site Reliability Engineer, you'll architect scalable infrastructure, drive reliability, mentor engineers, and lead AI enablement efforts, ensuring high-performance across systems.
Top Skills:
AWSCi/CdDatadogElasticsearchGoGrafanaKubernetesNew RelicPrometheusPythonRds (Mysql/Postgres)Sql-Based RdbmsTypescript
Artificial Intelligence • Healthtech
The Site Reliability Engineer will enhance system reliability, define observability standards, respond to incidents, and collaborate with engineering teams on performance and compliance improvements.
Top Skills:
AWSContainerized ServicesDistributed WorkflowsObservability ToolingPostgresServerless Compute
Aerospace • Manufacturing
As a Site Reliability Engineer, you'll build and manage observability platforms for satellite communications, define SLOs/SLIs, and collaborate on incident response and deployment automation.
Top Skills:
ArgocdAWSElkGCPGoGrafanaIstioJaegerKubernetesLinkerdLokiOpentelemetryPrometheusPythonTempoTerraform
Aerospace • Manufacturing
The Staff Site Reliability Engineer will design and manage Aalyria's centralized observability platform, focus on metrics, logging, and tracing systems, implement SLOs and SLIs, automate deployments, and drive incident response strategies for enhanced reliability across satellite and cloud platforms.
Top Skills:
AWSElkGCPGitopsGoGrafanaJaegerJavaKubernetesLokiOpentelemetryPrometheusPythonTempoTerraform
Logistics • Software
Seeking a Staff Site Reliability Engineer to enhance infrastructure reliability and performance using advanced engineering principles, primarily on Google Cloud Platform.
Top Skills:
AnsibleChefCloudFormationDatadogDockerElkFluentdGithub ActionsGitlab CiGoGoogle Cloud PlatformGrafanaJavaJenkinsKafkaKubernetesMySQLNew RelicPostgresPrometheusPub/SubPulumiPuppetPythonRedisSplunkTerraform
Artificial Intelligence • Machine Learning • Generative AI
As a Site Reliability Engineer, you will manage Kubernetes clusters, automate infrastructure, improve operational metrics, and enhance reliability across data centers.
Top Skills:
CloudFormationGoGpuKubernetesLinuxPythonTerraform
Artificial Intelligence
The Staff/Lead/Senior/Principal Site Reliability Engineer will establish SRE practices, ensure platform reliability, and support infrastructure scaling for enterprise AI workloads.
Top Skills:
AWSBetterstackCloudwatchGithub ActionsGrafanaKubernetesMongodbPagerdutyPostgresPrometheusTerraform
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The role involves supporting network infrastructure, automating cloud services, deploying Kubernetes, managing CI/CD workflows, and ensuring cloud security best practices.
Top Skills:
AnsibleAWSBashChefDockerGitGoKubernetesPuppetPythonRubySaltTerraform
Cloud • Fintech • Information Technology • Software • Business Intelligence
As a Site Reliability Engineer, you will ensure production system reliability, optimize performance, respond to incidents, and collaborate on infrastructure improvements.
Top Skills:
AnsibleAWSBashDatadogDockerElkGitGrafanaKubernetesNew RelicOpentelemetryPrometheusPythonReactRubyRuby On RailsTerraform
Top San Francisco Companies Hiring Senior Site Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results










.png)








.png)










