Get the job you really want.
Maximum of 25 job preferences reached.
Top Senior Site Reliability Engineer Jobs in San Francisco, CA
Cloud • Mobile • Software
Improve and protect production reliability and performance of AWS-based systems. Implement SRE practices (SLIs/SLOs, error budgets), build observability, automate infrastructure with Terraform, contribute code and tooling, participate in incident response, and document runbooks and best practices.
Top Skills:
AWSDatadogDockerEcsEksGrafanaHoneycombIncident.IoKubernetesLlmsNew RelicNode.jsOpsgeniePagerdutyPrometheusPythonTerraformTypescript
Cloud • Hardware • Security • Software
Manage and enhance infrastructure, automate processes, define roadmaps, and support engineering teams while ensuring uptime and efficiency.
Top Skills:
ArgocdAWSKubernetesPythonTerraform
Artificial Intelligence • Information Technology • Machine Learning • Natural Language Processing • Productivity • Software • Generative AI
As a Site Reliability Engineer, you'll build software to ensure system reliability, scale infrastructure, and deploy ML systems while collaborating with cross-functional teams.
Top Skills:
AWSAzureDockerGCPJavaKubernetesLinuxTerraform
Reposted 7 Days AgoSaved
Easy Apply
Easy Apply
AdTech
As a Site Reliability Engineer, you'll maintain the infrastructure for systems, ensure efficiency, automate processes, monitor databases, and participate in architecture discussions.
Top Skills:
Amazon KinesisAws LambdaAws SnsBigQueryDockerGcp (Google Cloud Platform)GitlabGoogle Cloud FunctionsGoogle Cloud RunGoogle Pub/SubGrafanaIstioKafkaKubernetesMySQLPrometheusSpannerSQLTerraform
AdTech
The Site Reliability Engineer will build and maintain infrastructure, manage databases, automate operations, and ensure system efficiency and scalability at Attain.
Top Skills:
Amazon KinesisAws LambdaAws SnsBigQueryDockerGCPGitlabGoogle Cloud FunctionsGoogle Cloud RunGoogle Pub/SubGrafanaIstioKafkaKubernetesMySQLPrometheusSpannerTerraform
Big Data • Cloud • Productivity • Software • Database • Analytics • Automation
The Site Reliability Engineer will automate tasks, enhance platform infrastructure, improve observability, and lead incident response efforts for optimal performance.
Top Skills:
AWSGrafanaHoneycombLinuxPythonTerraform
Reposted 6 Days AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The Site Reliability Engineer will enhance CI/CD frameworks, automate cloud infrastructure, manage Kubernetes and AWS services, and ensure operational excellence.
Top Skills:
AnsibleAWSBashChefCi/CdDockerGitKubernetesPuppetPythonRubySaltTerraform
Artificial Intelligence • Other • Security • Software • Analytics • Big Data Analytics
The Lead Site Reliability Engineer will oversee the Infrastructure SRE team, focusing on system reliability, automation, and mentoring while collaborating with product engineering.
Top Skills:
Ci/CdDatadogDockerElk StackGitopsGoKubernetesLinux/UnixNew RelicNoSQLPrometheusPythonSQLStackdriverTerraform
Reposted 8 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will support, maintain and grow the Atlas platform, focusing on automating processes and running multi-cloud environments.
Top Skills:
AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls
Reposted 18 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
As a Staff Engineer in the InfraSec team, you'll lead the design and deployment of security solutions for cloud platforms, automate monitoring, and manage security tooling while mentoring a small team of SREs.
Top Skills:
AnsibleAWSAzureCloudFormationGCPGoTerraform
12 Days AgoSaved
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Conversational AI
The engineer will build and operate AI/ML infrastructure, managing services on AWS and bare metal, using tools like Kubernetes and Terraform.
Top Skills:
AWSBashGoKubernetesPythonSlurmTerraform
Software
As a Site Reliability Engineer, you'll build and maintain infrastructure for ML models, automate processes, and collaborate cross-functionally.
Top Skills:
Circle CiCloudFormationElk StackGithub ActionsGitlab CiGrafanaJenkinsKubernetesOpentelemetryPrometheusPulumiTerraform
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Big Data • Cloud • Software • Database
Lead a 6–8 person team managing the Kubernetes fleet and core runtime components (CoreDNS, cert-manager, Gatekeeper). Define technical vision and roadmap, guide migration from Terraform to Operator-driven lifecycle management, perform hands-on architectural reviews and PR reviews, resolve operational incidents, and collaborate with engineering leaders and stakeholders.
Top Skills:
AlertingAWSAzureCert-ManagerContainerizationCorednsCrossplaneGatekeeperGCPKubernetesLoad BalancingObservabilityOperatorsService MeshTerraform
Big Data • Cloud • Software • Database
Develop and maintain Kubernetes runtime environments, support developers, resolve critical issues, and participate in on-call rotations for production systems.
Top Skills:
AWSAzureCert-ManagerCorednsCrdsCriCsiGatekeeperGCPGoHelmKubernetesKustomizeOperatorsPythonTerraform
Reposted 16 Days AgoSaved
Easy Apply
Easy Apply
Cloud • Information Technology • Security • Software • Cybersecurity
This internship role focuses on SRE skills, requiring collaboration and problem-solving in dynamic environments for Zscaler's Zero Trust Exchange team.
Top Skills:
AnsibleAws EcsKubernetesLinuxPythonTerraform
Cloud • Software
Responsible for maintaining FedRAMP compliant services, designing infrastructure, monitoring systems, and ensuring security for federal regions, while driving automation and collaboration with development teams.
Top Skills:
AWSFedrampGoKubernetesPuppetPythonTerraformUnix/Linux
Cloud • Mobile • Software
Drive SRE practices and reliability strategy: implement SLIs/SLOs and error budgets, build observability (metrics, logs, traces, dashboards, alerts), evolve AWS/Terraform infrastructure, automate toil, participate in incident response, develop runbooks and safeguards, and collaborate with engineering and product teams to design and operate reliable services.
Top Skills:
Ai-Assisted ToolingAWSDatadogDockerEcsEksGrafanaHoneycombIncident.IoInfrastructure As CodeKubernetesLlmsNew RelicNode.jsOpsgeniePagerdutyPrometheusPythonTerraformTypescript
Marketing Tech • Mobile • Software
As a Senior Site Reliability Engineer, you'll maintain and enhance the Currents data export system, focusing on observability, scalability, and reliability, while mentoring junior engineers and solving performance issues.
Top Skills:
BuildkiteDatadogDocker SwarmGitGitlabJavaJenkinsKafkaKotlinKubernetesMongoDBPagerdutyPostgresRubySentrySidekiqSnsSqs
Cloud • Digital Media • Information Technology
Operate and improve Kubernetes-based production systems, manage cluster lifecycle and networking, build CI/CD and GitOps pipelines, define SLOs and incident response, automate resolution with AI, implement monitoring/alerting, and drive reliability through automation and chaos engineering.
Top Skills:
AnsibleArgocdBashBgpCalicoCephCiliumCni PluginsCorootDatadogDnsEbpfFalcoFluxcdGoGrafanaKubernetesLokiLonghornMetallbPrometheusPythonSIEMTerraformThanosVictoriametricsVxlanXdp
Healthtech • Insurance
The Senior Software Engineer will lead complex projects, mentor engineers, and ensure cloud infrastructure is resilient and automated. Responsibilities include developing software, managing production environments, and enforcing coding standards.
Top Skills:
ArgocdAWSGCPGithub ActionsGrafanaIstioKubernetesPrometheusTerraform
Artificial Intelligence • Software
As a Software Engineer on the Site Reliability team, you'll ensure system reliability, scalability, and observability while partnering with engineering teams and improving incident management processes.
Top Skills:
AWSCi/Cd ToolingContainer OrchestrationDatadogGrafanaPrometheusTerraform
Healthtech • Software
As a DevOps Engineer, you'll build and maintain scalable infrastructures, manage monitoring systems, provide operational support, and collaborate across teams to enhance the company's cloud environment.
Top Skills:
AnsibleAWSAzureBashChefDockerGCPGithub ActionsJenkinsPostgresPuppetPythonTerraform
Financial Services
Design, develop, and deploy robust platform solutions while ensuring reliability, scalability, and security of the system. Collaborate with teams to enhance tooling and automation.
Top Skills:
GCPKubernetesTerraform
Information Technology • Legal Tech
The Senior Technology Site Reliability Engineer is responsible for maintaining and optimizing infrastructure and applications, ensuring reliability and performance while automating processes and collaborating with teams.
Top Skills:
AWSChefDatadogGoGrafanaJavaPrometheusPuppetPythonSaltTerraform
Big Data • Healthtech • HR Tech • Machine Learning • Software • Telehealth • Big Data Analytics
The Staff Site Reliability Engineer will architect, operate, and improve the platform while ensuring security compliance and enhancing development processes.
Top Skills:
AWSElasticsearchIstioKubernetesNatsNode.jsPostgresPythonReactTerraformTypescript
Popular Job Searches
All Filters
Total selected ()
No Results
No Results





.png)
.png)




.jpg)




.png)









