Get the job you really want.
Maximum of 25 job preferences reached.
Top Senior Site Reliability Engineer Jobs in San Francisco, CA
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Financial Services • Generative AI
As a Site Reliability Engineer, you will ensure system uptime, manage CI/CD pipelines, and enhance security and observability while troubleshooting issues in a collaborative environment.
Top Skills:
AWSAzureCloudFormationDatadogDockerGCPGrafanaKubernetesPrometheusTerraform
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
The Embedded Site Reliability Engineer will develop and maintain software applications for Bitcoin mining, focusing on embedded systems and cloud observability. Responsibilities include software testing, bug triage, and collaboration with engineering teams to optimize performance and reliability.
Top Skills:
CC++DatadogElasticGoGrafanaJavaScriptLinuxPythonRustSplunkSQLTypescript
Reposted 2 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
As a Staff Engineer in the InfraSec team, you'll lead the design and deployment of security solutions for cloud platforms, automate monitoring, and manage security tooling while mentoring a small team of SREs.
Top Skills:
AnsibleAWSAzureCloudFormationGCPGoTerraform
Big Data • Information Technology • Productivity • Software • Analytics • Business Intelligence • Consulting
Join Celonis' Reliability Engineering team to ensure the health and performance of their platform, applying SRE principles and mentoring engineers while leading reliability efforts for microservices on Kubernetes.
Top Skills:
ArgocdAWSAzureDatadogGCPGithub ActionsJavaKubernetesKustomizePythonSpring FrameworkTerraform
Financial Services
As a Principal Site Reliability Engineer, you'll architect reliability solutions, lead observability initiatives, and mentor teams for enhanced operational efficiency.
Top Skills:
Cloud-Native InstrumentationOpen TelemetryStreaming Data Platforms
Reposted 10 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
As a Staff Site Reliability Engineer, you will empower developers by optimizing MongoDB Atlas, ensuring seamless performance across multiple cloud platforms while fostering a supportive culture.
Top Skills:
AWSGCPAzureMongoDB
Artificial Intelligence • Cloud • Consumer Web • eCommerce • Information Technology • Software
The Site Reliability Engineer will ensure application performance, architect monitoring tools, analyze systems, provide reliability recommendations, and support production.
Top Skills:
AnsibleCentosDatadogDockerLinuxMySQLNew RelicRhelSQL
Cloud • Hardware • Security • Software
Manage and enhance infrastructure, automate processes, define roadmaps, and support engineering teams while ensuring uptime and efficiency.
Top Skills:
ArgocdAWSKubernetesPythonTerraform
Cloud • Information Technology • Security • Software • Cybersecurity
Join a talented team as a Systems Reliability Engineer to enhance the Cloudflare platform's availability and performance using automation and monitoring tools.
Top Skills:
AnsibleApache AirflowChefConsulDockerGoGrafanaLinuxNginxNomadPostgresPrometheusPuppetPythonRustSaltstackSQLTemporal
Big Data • Cloud • Productivity • Software • Database • Analytics • Automation
The Site Reliability Engineer will support engineering teams, enhance system resilience, and drive scalable infrastructure practices.
Top Skills:
Aws ServicesGrafanaHoneycombLinuxPythonTerraform
Artificial Intelligence • Fintech • Hardware • Information Technology • Sales • Software • Transportation
Design, scale, and manage AWS services for IoT devices. Collaborate on infrastructure, optimize performance, and ensure high availability of services.
Top Skills:
AWSBashGoHelmKubernetesPythonRubyTerraform
Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Software
The AWS Cloud Architect will design, build, and optimize cloud infrastructure, ensuring scalability and security while mentoring junior SREs and defining cloud strategy.
Top Skills:
AnsibleAws Api GatewayAws CloudfrontAws CloudtrailAws CloudwatchAws DocumentdbAws Ec2Aws EksAws LambdaAws RdsAws S3Aws Secrets ManagerAws SsmDockerGrafanaHashicorp ConsulHashicorp TerraformHashicorp VaultKubernetesNew RelicPrometheus
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Sales • Software • Automation
Join the Infrastructure Team to build and maintain critical systems, automating database lifecycles and enhancing disaster recovery with a focus on resilience and simplicity.
Top Skills:
AnsibleArgocdAWSClickhouseDockerElasticsearchFlaskGithub ActionsGrafanaKubernetesMongoDBPostgresPythonRedisTerraform
Artificial Intelligence • Healthtech • Software
As a Site Reliability Engineer, you will manage cloud infrastructure, implement observability, and ensure system reliability by collaborating with engineering teams and maintaining databases.
Top Skills:
AzureBashGitGitKubernetesPostgresPythonRedisSQLTypescriptVscode
Reposted 9 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
This role involves building and maintaining observability services, ensuring service reliability, and collaborating with other teams on best practices.
Top Skills:
AWSFluentbitGCPJaegerKubernetesAzureQuickwitSplunkVectorVictoriametrics
Artificial Intelligence • Machine Learning • Software
As a Staff Site Reliability Engineer, you will enhance the reliability, scalability, and performance of production services by applying SRE principles, implementing observability practices, automating processes, and collaborating with engineering teams.
Top Skills:
AWSAzureCloudFormationDatadogDockerElk StackGCPGoGrafanaJaegerKubernetesOpentelemetryOpentofuPrometheusPythonTerraform
eCommerce • Legal Tech • Professional Services • Software • Data Privacy
The Site Reliability Engineer will ensure systems run smoothly, work with automation tools, resolve issues, and drive operational improvements.
Top Skills:
AWSAzureCloudFormationDockerGCPGrafanaKubernetesMemcachedNew RelicOpentelemetryPostgresPrometheusPulumiRedisSentryTerraform
Fintech • Machine Learning • Payments • Software • Financial Services
Lead a team of developers to create cloud-based solutions while driving transformations using DevOps practices. Collaborate across teams to solve business challenges and mentor engineers.
Top Skills:
AnsibleAWSDockerGoJavaKubernetesPythonRubySQLTerraform
Artificial Intelligence • Information Technology
As a Site Reliability Engineer, maintain user-facing services, implement best practices for reliability, and manage production incidents.
Top Skills:
AnsibleCloud ServicesKubernetesProgramming LanguagesTerraform
Reposted 19 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will support, maintain and grow the Atlas platform, focusing on automating processes and running multi-cloud environments.
Top Skills:
AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls
Artificial Intelligence • Healthtech • Information Technology • Software
As a Site Reliability Engineer, you will manage the production environment, focusing on infrastructure design, automation, and optimizing deployment pipelines to ensure high availability.
Top Skills:
HelmKafkaKubernetesPostgresPythonRedisTerraformTypescript
Financial Services
The Senior Cluster Site Reliability Engineer will enhance the research compute cluster's uptime, reliability, and performance through engineering and operational improvements, ensuring high availability for researchers working on machine learning problems.
Top Skills:
AnsibleAWSAWSCephDockerElkGCPGCPGrafanaHorovodHpcInfinibandKubeflowKueueLokiLustreMlflowOpentelemetryPodmanPrometheusPythonRdmaRubyS3SingularitySlurmTerraform
Information Technology
As a Site Reliability Engineer, you'll design and operate scalable storage systems and optimize performance for AI research data management.
Top Skills:
GoKubernetesPulumiRust
Automotive
As a Senior Technical Program Manager for SRE & On-call Excellence, you will manage projects that improve incident response, on-call protocols, and system reliability, collaborating with various engineering teams to drive successful execution.
Top Skills:
Cloud InfrastructureDevops PracticesDistributed SystemsSite Reliability Engineering
Energy
The Site Reliability Engineer will design and implement scalable systems, automate IT infrastructure management, and support deployed systems, ensuring high availability and performance.
Top Skills:
Active DirectoryAnsibleAWSAzureChefJSONLinuxPuppetPythonRestVMwareWindows ServerYaml
Top San Francisco Companies Hiring Senior Site Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results





























