Get the job you really want.
Maximum of 25 job preferences reached.
Top Reliability Engineer Jobs in San Francisco, CA
Artificial Intelligence • Fintech • Machine Learning • Social Impact • Software
As a Senior Software Engineer focused on Site Reliability Tooling, you'll enhance system reliability, implement SRE practices, and build automation tools to support site reliability across Upstart's infrastructure.
Top Skills:
CdkCloudFormationDatadogGoJavaScriptKubernetesPrometheusPythonTerraformTypescript
Industrial • Manufacturing
Lead reliability strategy for HVAC components, develop predictive life models, design tests, and improve product durability. Analyze data and mentor junior engineers.
Top Skills:
Accelerated Test MethodsHvac SystemsIot DevicesPredictive Life ModelsWeibull Modeling
Appliances
Lead reliability strategies for HVAC components, develop predictive models, conduct tests, analyze data, and mentor junior engineers.
Top Skills:
Accelerated Life TestingCorrosion TestingEnvironmental ChambersHvac SystemsReliability EngineeringVibration TablesWeibull Analysis
Sales • Software • Automation
Join the Infrastructure Team to build and maintain critical systems, automating database lifecycles and enhancing disaster recovery with a focus on resilience and simplicity.
Top Skills:
AnsibleArgocdAWSClickhouseDockerElasticsearchFlaskGithub ActionsGrafanaKubernetesMongoDBPostgresPythonRedisTerraform
Marketing Tech
The Cloud Reliability Engineer develops, configures, and deploys cloud tools, enhances applications, ensures observability, and participates in on-call rotations.
Top Skills:
AWSCi/CdDockerGithub ActionsGoGoogle BigqueryGCPKubernetesLinuxPythonSQLTerraform
Artificial Intelligence • Healthtech • Software
As a Site Reliability Engineer, you will manage cloud infrastructure, implement observability, and ensure system reliability by collaborating with engineering teams and maintaining databases.
Top Skills:
AzureBashGitGitKubernetesPostgresPythonRedisSQLTypescriptVscode
Reposted 8 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
This role involves building and maintaining observability services, ensuring service reliability, and collaborating with other teams on best practices.
Top Skills:
AWSFluentbitGCPJaegerKubernetesAzureQuickwitSplunkVectorVictoriametrics
Information Technology • Security • Cybersecurity
Responsible for managing Oracle RAC databases, optimizing performance, ensuring security and integrity, and providing 24x7 support for production applications.
Top Skills:
CassandraCephElasticsearchKafkaOracleRedis
Cloud
The role involves designing and optimizing PostgreSQL clusters, automating database tasks, and ensuring high availability and performance while collaborating with other engineering teams.
Top Skills:
AnsibleDatadogGoGrafanaKubernetesMySQLPostgresPrometheusPythonTerraform
Artificial Intelligence • Cybersecurity
The Database Reliability Engineer will ensure database availability, performance, scalability, and security across AWS, collaborating with application and security teams.
Top Skills:
AWSCrossplaneDatadogGitlab Ci/CdKubernetesNoSQLOpensearchPostgresTerraform
Artificial Intelligence • Machine Learning • Software
As a Staff Site Reliability Engineer, you will enhance the reliability, scalability, and performance of production services by applying SRE principles, implementing observability practices, automating processes, and collaborating with engineering teams.
Top Skills:
AWSAzureCloudFormationDatadogDockerElk StackGCPGoGrafanaJaegerKubernetesOpentelemetryOpentofuPrometheusPythonTerraform
eCommerce • Legal Tech • Professional Services • Software • Data Privacy
The Site Reliability Engineer will ensure systems run smoothly, work with automation tools, resolve issues, and drive operational improvements.
Top Skills:
AWSAzureCloudFormationDockerGCPGrafanaKubernetesMemcachedNew RelicOpentelemetryPostgresPrometheusPulumiRedisSentryTerraform
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
The Senior Engineer will automate and ensure the reliability of large-scale distributed systems, troubleshoot server issues, and manage operational aspects for high availability and performance.
Top Skills:
AnsibleC++ChefDockerElkFreenasGoGrafanaIscsiJavaKubernetesLinuxNasNfsObject StoragePrometheusPuppetPythonSanVMwareWindows
Aerospace • Hardware • Logistics • Robotics • Software • Transportation
Design for Reliability Engineer responsible for ensuring the safety and reliability of drone-delivery systems through testing, statistical analysis, and innovative design solutions.
Top Skills:
JmpMatlabMinitabPythonReliasoft
Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3
The Site Reliability Engineer will build and maintain infrastructure, improve software systems, develop scalable microservices, and ensure quality software delivery.
Top Skills:
AWSGoGoogle Cloud PlatformJavaKubernetesAzureSQL
Fintech • Machine Learning • Payments • Software • Financial Services
Lead a team of developers to create cloud-based solutions while driving transformations using DevOps practices. Collaborate across teams to solve business challenges and mentor engineers.
Top Skills:
AnsibleAWSDockerGoJavaKubernetesPythonRubySQLTerraform
Reposted 18 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will support, maintain and grow the Atlas platform, focusing on automating processes and running multi-cloud environments.
Top Skills:
AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls
Consumer Web • Mobile
As a Site Reliability Engineer at Patreon, you'll improve AWS infrastructure, implement SRE practices, enhance Kubernetes capabilities, and develop automation tools.
Top Skills:
AnsibleAWSChefKubernetesPuppetPythonTerraform
Artificial Intelligence • Healthtech • Information Technology • Software
As a Site Reliability Engineer, you will manage the production environment, focusing on infrastructure design, automation, and optimizing deployment pipelines to ensure high availability.
Top Skills:
HelmKafkaKubernetesPostgresPythonRedisTerraformTypescript
Financial Services
The Senior Cluster Site Reliability Engineer will enhance the research compute cluster's uptime, reliability, and performance through engineering and operational improvements, ensuring high availability for researchers working on machine learning problems.
Top Skills:
AnsibleAWSAWSCephDockerElkGCPGCPGrafanaHorovodHpcInfinibandKubeflowKueueLokiLustreMlflowOpentelemetryPodmanPrometheusPythonRdmaRubyS3SingularitySlurmTerraform
Artificial Intelligence • Software
The Network Operations Engineer will lead site operations, ensuring network reliability, handling incidents, coordinating hardware repairs, and supporting datacenter deployments. Responsibilities include executing maintenance runbooks and mentoring junior engineers while collaborating with cross-functional teams.
Top Skills:
AnsibleBgpClos TopologiesEvpn/VxlanHigh-Radix SwitchingPython
Information Technology
As a Site Reliability Engineer, you'll design and operate scalable storage systems and optimize performance for AI research data management.
Top Skills:
GoKubernetesPulumiRust
Automotive
As a Senior Technical Program Manager for SRE & On-call Excellence, you will manage projects that improve incident response, on-call protocols, and system reliability, collaborating with various engineering teams to drive successful execution.
Top Skills:
Cloud InfrastructureDevops PracticesDistributed SystemsSite Reliability Engineering
Energy
The Site Reliability Engineer will design and implement scalable systems, automate IT infrastructure management, and support deployed systems, ensuring high availability and performance.
Top Skills:
Active DirectoryAnsibleAWSAzureChefJSONLinuxPuppetPythonRestVMwareWindows ServerYaml
Edtech
The Senior Site Reliability Engineer will ensure product reliability and performance, develop monitoring and alerting systems, and propose architectural changes.
Top Skills:
AWSBashCC++DockerGCPJavaKubernetesPerlPython
Top San Francisco Companies Hiring Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results































