Get the job you really want.
Top Reliability Engineer Jobs in San Francisco, CA
Reposted 2 Days AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence • Fintech • Machine Learning • Social Impact • Software
The Senior Software Engineer will lead efforts in site reliability engineering, improving monitoring, incident response, and tooling to enhance system reliability and performance.
Top Skills:
CdkDatadogGoJavaScriptPrometheusPulumiPythonTerraformTypescript
Insurance • Sales • Software
As a Cloud & Site Reliability Engineer, ensure reliability and availability of software systems, participate in agile ceremonies, build infrastructure with IaC, and manage on-call duties.
Top Skills:
CloudDelivery PipelinesDevOpsInfrastructure As Code
Digital Media • Kids + Family • Mobile • Software • Sports
The Senior DevOps Engineer will enhance observability and performance monitoring systems, implement SRE best practices, develop custom dashboards, and collaborate across teams to ensure reliability and scalability of cloud infrastructure.
Top Skills:
DatadogGithub ActionsGitlab CiJenkinsPrometheusPythonTerraformTypescript
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
The role involves developing and maintaining cloud services for reliability and scalability, optimizing architecture, and mentoring other developers while focusing on innovative software practices.
Top Skills:
AWSCassandraElasticsearchGoJavaKafkaKotlinNode.jsPythonScala
Artificial Intelligence • Cloud • Sales • Security • Software • Cybersecurity • Data Privacy
As a Staff Site Reliability Engineer, you will enhance the reliability of SailPoint's identity security services, coach engineers on best practices, and influence architectural designs for scalability.
Top Skills:
AWSGoGrafanaHoneycombJavaKibanaKubernetesPrometheusPythonTerraform
Reposted 4 Days AgoSaved
Easy Apply
Easy Apply
Cloud • Security • Software • Cybersecurity • Automation
The Senior Site Reliability Engineer is responsible for maintaining user-facing services, managing database operations, and optimizing cloud infrastructure at GitLab. Key responsibilities include designing and maintaining ClickHouse and PostgreSQL clusters, implementing monitoring systems, and ensuring security compliance. The role requires strong technical skills in database management and cloud automation, along with leadership and communication abilities.
Top Skills:
AnsibleChefClickhouseGoGrafanaHelmKubernetesLinuxPostgresPrometheusPythonRubyTerraform
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The Senior Software Engineer will enhance system reliability, manage projects for scalability, develop automation tools, and mentor engineering teams.
Top Skills:
AWSAzureDockerEc2GCPGoKubernetesRubyTerraform
Artificial Intelligence • Cloud • Fintech • Professional Services • Software • Analytics • Financial Services
As a Senior Software Engineer in the SRE team, you will design and develop solutions for reliability and performance, collaborating with engineering teams on internal tools and observability features.
Top Skills:
AWSDartDockerGitGitGoJavaKafkaKubernetesMySQLNginxOpentelemetryPostgresPythonReactSnowflakeTerraformTypescript
Featured Jobs
14 Days AgoSaved
Easy Apply
Easy Apply
Cloud • Software
The Senior Site Reliability Engineer will manage and optimize AWS infrastructure while ensuring high availability and performance of the ThousandEyes platform.
Top Skills:
AWSDockerEcsEksGoPythonTerraform
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The Site Reliability Engineer will build and maintain secure system architectures for various client platforms and manage DevOps tooling.
Top Skills:
AnsibleAutopkgAWSAzureGCPGoMicromdmMunkiNanomdmPuppetPythonRubyTerraform
Fintech • Social Impact • Financial Services
The Site Reliability Engineer will enhance the reliability, availability, and performance of the AWS platform, collaborating with various teams and improving observability, monitoring, and automation.
Top Skills:
AWSC#C/C++Ci/CdDatadogEcsElk StackGoGrafanaJavaJavaScriptKubernetesLinuxNode.jsPrometheusPythonRubyTerraformYaml
Cloud • Fintech • Information Technology • Machine Learning • Software
Responsible for incident management and reliability for Xero. Lead critical outage responses, develop scalable processes, and enhance service reliability through cross-team collaboration and ongoing training.
Top Skills:
AWSBgpDnssecIpsecPythonSsl/TlsTcp/Ip
Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
Develop and manage scalable infrastructure, automate server provisioning, maintain reliability and monitoring of services, and support fleet operations.
Top Skills:
AnsibleAWSChefDhcpDnsGCPLinuxAzureNtpPuppetPxe
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The Staff Site Reliability Engineer at Coinbase will improve system reliability, mentor engineers, automate processes, and oversee software integrity, focusing on high-quality coding and performance tuning.
Top Skills:
AWSAzureDatadogDockerEc2GCPGoKibanaKubernetesRubyTerraform
eCommerce • Legal Tech • Professional Services • Software • Data Privacy
The Site Reliability Engineer will maintain system stability, automate processes, handle incidents, and enhance reliability while collaborating with dev teams.
Top Skills:
AWSAzureCloudFormationDockerGCPGrafanaKubernetesMemcachedNew RelicOpentelemetryPostgresPrometheusPulumiRedisSentryTerraform
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
As a Senior Site Reliability Engineer, you will manage AI tools, ensure system reliability, develop automation scripts, and collaborate across teams to enhance AI infrastructures.
Top Skills:
Aws,Gcp,Python,Go,Java,Ansible,Terrraform,Bash
Artificial Intelligence • Cloud • Sales • Security • Software • Cybersecurity • Data Privacy
As a Principal SRE, you will drive reliability practices for the Identity Security Cloud platform, coach teams, manage architecture and capacity, and provide technical leadership while improving operational excellence.
Top Skills:
AWSGoGrafanaHoneycombJavaKibanaKubernetesPrometheusPythonTerraform
Blockchain • Information Technology • Software • Cryptocurrency • Web3
As a Site Reliability Engineer, you will enhance infrastructure reliability and developer productivity, implement best practices, and mentor teams on reliability initiatives.
Top Skills:
ArgoAWSChefDatadogDockerFluxGitopsGrafanaKubernetesPrometheusPulumiPuppetPythonTerraformTypescript
20 Days AgoSaved
Easy Apply
Easy Apply
Hardware • Information Technology • Security • Software • Cybersecurity • Conversational AI
As a Lead Site Reliability Engineer, you will enhance cloud infrastructure, automate operations, and troubleshoot complex production issues in a secure environment.
Top Skills:
AnsibleAWSBashChefDirect ConnectDockerGoKubernetesPuppetPythonRestRubyScalaSoapTlsTransit GatewayUnix/LinuxVpc
AdTech • Cloud • Information Technology • Marketing Tech • Software
Lead the design and implementation of reliability strategies for SMS infrastructure, focusing on automation, performance tuning, and collaboration with cross-functional teams.
Top Skills:
AnsibleAsteriskAWSAzureDockerElasticsearchGCPGitGitlabHaproxyInterconnectsJenkinsK8SKannelLinuxMySQLNatNginxOpensipsRestRtpSipSmsSngrepSnmpSpring BootTerraformTomcatVpnWafsWireshark
Cloud • Fintech • Information Technology • Machine Learning • Software
Lead the Product Site Reliability Engineering team to enhance system reliability and performance, drive automation, and ensure observability best practices.
Top Skills:
AWSAzureGCPSre Tools
Artificial Intelligence • Enterprise Web • Machine Learning • Natural Language Processing • Software • Conversational AI • Automation
As a Site Reliability Engineer, you'll enhance infrastructure security, automate deployments, optimize CI/CD processes, and drive engineering best practices while ensuring compliance and observability.
Top Skills:
Aws CloudElasticsearchGoJavaScriptMongoDBNode.jsReactRedisTerraform
Big Data • Cloud • Software • Database
Seeking a Senior Site Reliability Engineer to support and maintain the MongoDB Atlas platform, focusing on automation, system design, and operational excellence.
Top Skills:
AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls
19 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Fintech • Mobile • Payments • Financial Services
As a Senior Software Engineer in SRE, you will lead teams in building reliable backend systems, driving incident management, and fostering a culture of quality, while supporting product development and handling operational metrics.
Top Skills:
AWSKotlinKubernetesMySQLPython
19 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Fintech • Mobile • Payments • Financial Services
As a Staff Software Engineer in SRE, you will design and enhance backend systems, ensuring reliability and operational excellence while developing a culture of quality and mentorship within the team.
Top Skills:
AWSKotlinKubernetesMySQLPythonSpark
Top San Francisco Companies Hiring Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results