Get the job you really want.
Top Reliability Engineer Jobs in San Francisco, CA
Cloud • Information Technology • Security • Software • Cybersecurity
As a Network Reliability Engineer at Cloudflare, you will enhance network resilience by managing the technical operations of the core data center network, automating operational tasks, and contributing to system design. You'll collaborate with a team to develop and improve software solutions that streamline deployment and support a high-performance network.
Top Skills:
AirflowAnsibleBirdCC++ChefConfiguration Management FrameworksConsulCumulusDockerEosFrrGoGobgpJunosKubernetesLinuxLinux KernelLinux Software PackagingNetwork Reliability EngineeringNx-OsOpen Source Routing DaemonsPrometheusPythonRustSaltstackSonic Network Operating SystemsTemporal
Big Data • Information Technology • Productivity • Software • Analytics • Business Intelligence • Consulting
As a Senior Reliability Engineer at Celonis, you will ensure platform reliability and performance, lead incident management, and collaborate on system enhancements using SRE principles and automation.
Top Skills:
AWSAzureCi/CdGCPJavaKubernetesPythonSpring FrameworkTerraform
Marketing Tech • Mobile • Software
As a Database Reliability Engineer, you will ensure uptime for internal services, improve infrastructure automation, and manage incidents while collaborating with engineering teams.
Top Skills:
ChefDockerKafkaKubernetesMongoDBRedisRuby On RailsTerraform
Healthtech • Pharmaceutical • Telehealth
The Database Reliability Engineer will ensure database performance, scalability, and reliability, applying SRE principles and driving automation while collaborating with engineering teams.
Top Skills:
AWSBashDatadogGoInfrastructure-As-CodePostgresPulumiPythonRdsSplunkTerraform
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
The role involves maintaining and automating database systems, working with large-scale data and cloud infrastructures, and supporting engineering teams.
Top Skills:
AWSCassandraChefElasticsearchKafkaMySQLPostgresRubySaltZookeeper
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
The role involves maintaining data components, developing infrastructure services for the engineering team, and ensuring data security and availability.
Top Skills:
AWSCassandraChefElasticsearchKafkaMySQLPostgresRubySaltZookeeper
Cloud • Fintech • Information Technology • Machine Learning • Software
As an Engineer in SRE Observability, you will enhance operational excellence, build monitoring tools, and support teams to improve software reliability.
Top Skills:
C#CicdDatadogDockerDynatraceGoIacJavaScriptKubernetesLinuxNew RelicOpen TelemetryPythonScalyrSignalfxSplunkSumo Logic
Blockchain • Information Technology • Software • Cryptocurrency • Web3
As a Site Reliability Engineer, you will enhance infrastructure reliability and developer productivity, implement best practices, and mentor teams on reliability initiatives.
Top Skills:
ArgoAWSChefDatadogDockerFluxGitopsGrafanaKubernetesPrometheusPulumiPuppetPythonTerraformTypescript
Featured Jobs
Hardware • Information Technology • Security • Software • Cybersecurity • Conversational AI
As a Lead Site Reliability Engineer, you will enhance cloud infrastructure, automate operations, and troubleshoot complex production issues in a secure environment.
Top Skills:
AnsibleAWSBashChefDirect ConnectDockerGoKubernetesPuppetPythonRestRubyScalaSoapTlsTransit GatewayUnix/LinuxVpc
Cloud • Fintech • Information Technology • Machine Learning • Software
The role involves implementing chaos engineering experiments to enhance system resilience, collaborating with teams to optimize performance, and maintaining engineering frameworks.
Top Skills:
.NetAWSAzureC#C++Chaos MonkeyGCPGoGremlinJavaKubernetesLitmusPython
Reposted Yesterday
Easy Apply
Easy Apply
Cloud • Software
The Principal Site Reliability Engineer will oversee mission-critical datastores, ensuring reliability, scalability, and performance while leading automation efforts and mentoring the engineering team.
Top Skills:
AWSElasticsearchGoKafkaKubernetesMongoDBMySQLPythonTerraform
Cloud • Fintech • Information Technology • Machine Learning • Software
Lead the Product Site Reliability Engineering team to enhance system reliability and performance, drive automation, and ensure observability best practices.
Top Skills:
AWSAzureGCPSre Tools
Cloud • Information Technology • Productivity • Security • Software • App development • Automation
As a Site Reliability Engineer at Atlassian, you will manage and improve cloud infrastructure, automate processes, and ensure the reliability and performance of services. You will build monitoring into code, troubleshoot, and communicate technical issues effectively. Experience with public cloud offerings and backend engineering is essential.
Big Data • Cloud • Software • Database
Seeking a Site Reliability Engineer with strong networking skills to build and maintain secure infrastructure for service communication. Involves collaboration, support, and 24/7 on-call participation.
Top Skills:
AWSAzureBgpCloud ComputingDnsGCPKubernetesLoad BalancingSdnService MeshTcp/IpTls
Big Data • Cloud • Software • Database
Design and build global cloud infrastructure, automate monitoring services, optimize performance and manage critical production systems while participating in on-call rotations.
Top Skills:
Amazon Web ServicesAutomation ToolsGoogle ComputeKubernetesLinuxAzure
Big Data • Cloud • Software • Database
Lead the Fabric team as a Site Reliability Engineer, focusing on building resilient infrastructure for secure service communication, while overseeing team direction and addressing technical issues.
Top Skills:
AWSAzureBgpDnsGCPKubernetesTcp/IpTls/MtlsVpcs
Big Data • Cloud • Software • Database
Design and build infrastructure for cloud services; improve resilience, automation, and monitoring; participate in on-call rotation.
Top Skills:
Amazon Web ServicesCi/CdGCPKubernetesLinuxAzureMongoDB
Big Data • Cloud • Software • Database
The Lead Site Reliability Engineer will oversee a team in building robust networking infrastructure and ensure system resiliency and security for MongoDB services.
Top Skills:
AWSAzureBgpDnsGCPKubernetesSdnTcp/IpTls/Mtls
Big Data • Cloud • Software • Database
The Staff Site Reliability Engineer will manage multi-cloud infrastructure, focusing on secure communication, network architecture, and service mesh for MongoDB's products, ensuring high availability and resilience.
Top Skills:
AWSAzureBgpDnsGCPKubernetesMtlsSdnTlsVpcs
7 Days Ago
Easy Apply
Easy Apply
Hardware • Information Technology • Security • Software • Cybersecurity • Conversational AI
As a Senior Site Reliability Engineer, you'll design and manage testing infrastructure and collaborate with engineering teams to improve services in a hybrid cloud environment.
Top Skills:
ArgocdCircleCIGitlabGoHelmJavaJenkinsKubernetesPythonRubyTerraform
8 Days Ago
Easy Apply
Easy Apply
Hardware • Information Technology • Security • Software • Cybersecurity • Conversational AI
The Lead Site Reliability Engineer will design, develop, and operate observability systems, ensuring service reliability in large distributed environments. Responsibilities include scaling observability systems, writing monitoring libraries, and collaborating with engineering teams.
Top Skills:
AnsibleBashElasticsearchGoKafkaPrometheusPythonRubyScalaTerraform
Cloud • Fintech • Information Technology • Machine Learning • Software
The role involves enhancing system reliability through Chaos Engineering practices, automating processes, and integrating tools for better data insights.
Top Skills:
Chaos EngineeringSite Reliability Engineering
9 Days Ago
Easy Apply
Easy Apply
Cloud • Software
The Senior Site Reliability Engineer will manage and optimize AWS infrastructure while ensuring high availability and performance of the ThousandEyes platform.
Top Skills:
AWSDockerEcsEksGoPythonTerraform
Artificial Intelligence • Information Technology • Machine Learning • Security • Software • Cybersecurity • Generative AI
Responsible for maintaining production environment reliability and availability, implementing automation for operational issues and collaborating with engineering teams on services and infrastructure improvements.
Top Skills:
AWSDockerJavaKubernetesLinuxPerlPHPPythonRuby
Artificial Intelligence • Fintech • Information Technology • Software • Data Privacy
The Senior Site Reliability Engineer ensures SaaS products are stable and optimized, focusing on automation, monitoring, and collaboration within teams to maintain high service quality.
Top Skills:
AksAnsibleAppdynamicsAzure DevopsBashC# .NetCosmosDatadogDynatraceEksHarnessIdera Sql Diagnostic ManagerJavaJenkinsKubernetesNew RelicPowershellPythonRedgate Sql MonitorSolarwinds Database Performance AnalyzerSQLTerraform
Popular Job Searches
All Filters
Total selected ()
No Results
No Results