Get the job you really want.
Maximum of 25 job preferences reached.
Top Reliability Engineer Jobs in San Francisco, CA
eCommerce • Retail • Software
As a Senior Database Reliability Engineer, you will manage database systems, enhance observability through automation, and lead database upgrade initiatives while ensuring security and reliability.
Top Skills:
AWSCi/CdDynamoDBElasticsearchMongoDBMySQLPostgresPowershellPythonRedisSQL Server
Database
Manage and optimize Postgres databases at scale on AWS RDS, own reliability/monitoring, execute low-downtime upgrades and migrations, troubleshoot production issues, participate in on-call rotation, and collaborate with platform and product teams.
Top Skills:
Aws RdsBarmanGoPgbackrestPostgresTypescriptWal-G
Legal Tech • Software
Lead automation and optimization of Filevine's data platform: performance tune MSSQL/Postgres, optimize Snowflake, provision infrastructure with Terraform/AWS, run stateful containers on Kubernetes, integrate AI/LLM and MCP for operational automation, manage CI/CD, capacity planning, documentation, and serve in 24/7 on-call rotation.
Top Skills:
AWSC#DapperDockerDynamoDBEntity FrameworkGitlabKubernetesLlmsMcp (Model Context Protocol)Microsoft Sql Server (Mssql)Octopus DeployOpensearchPostgresPowershellPythonRedisSnowflakeTerraform
Reposted 6 Days AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The Site Reliability Engineer will enhance CI/CD frameworks, automate cloud infrastructure, manage Kubernetes and AWS services, and ensure operational excellence.
Top Skills:
AnsibleAWSBashChefCi/CdDockerGitKubernetesPuppetPythonRubySaltTerraform
Artificial Intelligence • Other • Security • Software • Analytics • Big Data Analytics
The Lead Site Reliability Engineer will oversee the Infrastructure SRE team, focusing on system reliability, automation, and mentoring while collaborating with product engineering.
Top Skills:
Ci/CdDatadogDockerElk StackGitopsGoKubernetesLinux/UnixNew RelicNoSQLPrometheusPythonSQLStackdriverTerraform
Software
Own reliability, performance, and scalability of PostgreSQL infrastructure. Implement HA, replication, observability, capacity planning, automation, and DR. Support engineering teams with migrations, query optimization, on-call incident response, runbooks, and tooling to enable safe DB operations.
Top Skills:
AnsibleAuroraAws RdsChefDatadogDynamoDBElasticacheGoGrafanaIndexingMvccPatroniPgbouncerPostgresPrometheusPythonQuery PlannerReplicationRubySQLTerraformVacuum TuningWal
Reposted 8 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will support, maintain and grow the Atlas platform, focusing on automating processes and running multi-cloud environments.
Top Skills:
AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls
Fintech • Financial Services
The Systems Reliability Engineer supports MEMX exchange platforms by responding to incidents, debugging issues, improving processes, and working with cross-functional teams to ensure platform availability.
Top Skills:
AnsibleBashChefLinuxLinux ShellMonitoring ToolsPuppetPython
Reposted 18 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
As a Staff Engineer in the InfraSec team, you'll lead the design and deployment of security solutions for cloud platforms, automate monitoring, and manage security tooling while mentoring a small team of SREs.
Top Skills:
AnsibleAWSAzureCloudFormationGCPGoTerraform
Reposted 11 Days AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence • Fintech • Machine Learning • Social Impact • Software
As a Principal Software Engineer on the SRE team, lead best practices adoption, mentor engineers, and improve system reliability and user experience through automation and collaboration.
Top Skills:
CdkCloudFormationDatadogGoJavaScriptPrometheusPythonTerraformTypescript
12 Days AgoSaved
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Conversational AI
The engineer will build and operate AI/ML infrastructure, managing services on AWS and bare metal, using tools like Kubernetes and Terraform.
Top Skills:
AWSBashGoKubernetesPythonSlurmTerraform
Reposted 16 Days AgoSaved
Easy Apply
Easy Apply
Energy
As a Reliability Engineer, you'll define system reliability requirements, execute various reliability tests, and analyze data to improve product performance in hardware engineering for energy storage.
Top Skills:
JmpMinitabPython
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
eCommerce • Healthtech • Kids + Family • Retail • Social Media
Seeking a Senior Software Engineer, Site Reliability to ensure system stability, scalability, and reliability, while optimizing AWS infrastructure using modern DevOps practices and tools like Terraform, Docker, and Kubernetes.
Top Skills:
AWSCircleCICronitorDatadogDockerGithub ActionsJenkinsKubernetesMySQLPagerdutyReactRedisRuby On RailsSentrySidekiqTerraform
Cloud
As a Database Reliability Engineer, oversee MySQL database services, ensure performance and availability, coordinate infrastructure tuning, and enhance operational processes.
Top Skills:
ChefCloudsqlDockerGrafanaKubernetesLinuxMySQLPostgresRds AuroraTerraform
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Financial Services • Generative AI
Own and improve critical production services end-to-end by writing production-quality code: instrumenting services, eliminating performance bottlenecks, building deployment and observability platforms, defining SLOs, running incident response and post-mortems, capacity planning and cost optimization, maintaining CI/CD, and embedding with product teams to design reliable systems.
Top Skills:
AWSC++Ci/CdContainer OrchestrationGoObservability StacksPythonRust
Artificial Intelligence • Hardware • Robotics • Software
The role involves developing and executing test strategies for autonomous systems, collaborating with engineering teams for reliability, and analyzing data for risk assessments.
Top Skills:
PythonSQL
eCommerce • Retail • Software
The Senior Database Reliability Engineer ensures database availability, reliability, and efficiency, driving initiatives for upgrades, automation, and security while mentoring team members.
Top Skills:
AWSDynamoDBElasticsearchMongoDBMySQLPostgresPowershellPythonRedisSQL Server
Food
The Reliability Engineer will manage maintenance of fixed assets, focusing on equipment reliability, predictive maintenance, and collaboration to reduce downtime and improve performance metrics of packaging operations.
Top Skills:
Automation EquipmentThermoforming Packaging MachinesTpm
Big Data • Cloud • Software • Database
Lead a 6–8 person team managing the Kubernetes fleet and core runtime components (CoreDNS, cert-manager, Gatekeeper). Define technical vision and roadmap, guide migration from Terraform to Operator-driven lifecycle management, perform hands-on architectural reviews and PR reviews, resolve operational incidents, and collaborate with engineering leaders and stakeholders.
Top Skills:
AlertingAWSAzureCert-ManagerContainerizationCorednsCrossplaneGatekeeperGCPKubernetesLoad BalancingObservabilityOperatorsService MeshTerraform
Artificial Intelligence • Information Technology • Robotics • Software
Sieve is seeking a Founding Reliability Engineer to build and maintain infrastructure for petabyte-scale video workloads, focusing on reliability and security. Responsibilities include incident response, cloud security, and observability systems management.
Top Skills:
AWSC++CloudflareGCPGoOpentelemetryOraclePrometheusPythonRustTerraformVictoriametrics
Reposted 18 Days AgoSaved
Easy Apply
Easy Apply
Energy
As a Reliability Engineer, you will define system reliability requirements, conduct tests, analyze failures, and collaborate on design improvements for hardware products.
Top Skills:
JmpMinitabPython
AdTech • Artificial Intelligence • Marketing Tech • Software • Analytics
The Senior Site Reliability Engineer will enhance system reliability, develop production-grade code, implement observability tools, conduct root cause analyses, and collaborate on system design for scalability.
Top Skills:
ArgocdCi/CdDockerGitopsGoGrafanaHoneycombJenkinsKubernetesOpentelemetryPrometheusPythonTerraform
Cloud • Internet of Things • Agriculture
The Hardware Test & Reliability Engineer will ensure product validation, oversee reliability testing, lead root cause analysis, and automate testing processes using Python to enhance data-driven quality improvement.
Top Skills:
DmmsLogic AnalyzersOscilloscopesPython
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The role involves supporting network infrastructure, automating cloud infrastructure, managing CI/CD workflows, and ensuring operational excellence in IT support, including incident response and security practices.
Top Skills:
AnsibleAWSBashDockerGitKubernetesPythonRubyTerraform
Reposted 16 Days AgoSaved
Easy Apply
Easy Apply
Cloud • Information Technology • Security • Software • Cybersecurity
This internship role focuses on SRE skills, requiring collaboration and problem-solving in dynamic environments for Zscaler's Zero Trust Exchange team.
Top Skills:
AnsibleAws EcsKubernetesLinuxPythonTerraform
Popular Job Searches
All Filters
Total selected ()
No Results
No Results



.png)
.png)
























