Maximum of 25 job preferences reached.
Top Reliability Engineer Jobs in San Francisco, CA
Healthtech • Information Technology • Software • Telehealth
The Senior Site Reliability Engineer will develop, monitor, and maintain distributed production systems, ensuring uptime for patients and providers while automating processes and supporting a large engineering team.
Top Skills:
AWSDockerGCPKubernetes
HR Tech • Information Technology • Professional Services • Sales • Software
Own and operate production-grade Kubernetes infrastructure on AWS, build GitOps CI/CD with GitHub Actions and ArgoCD, develop AI agents and internal DevOps tooling, maintain Datadog-based observability, and manage on-call incident response while collaborating with engineering teams to improve reliability and delivery speed.
Top Skills:
Ai/LlmArgocdAWSCi/CdDatadogGithub ActionsGitopsGoKubernetesPython
Big Data • Cloud • Software • Database
As a Senior Site Reliability Engineer, you'll design and build complex systems, support Atlas platform operations, automate processes, and ensure high availability of services.
Top Skills:
AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls
Artificial Intelligence • Machine Learning • Generative AI
The Reliability/DFX Engineer will oversee DFX architecture and improve system reliability, working closely with teams on AI hardware design and implementation.
Top Skills:
Data AnalysisDftMl Chip ArchitectureRtl DesignSilicon Ate
Artificial Intelligence • Software
The Site Reliability Engineer ensures the reliability and performance of products Devin and Windsurf, managing incident response, CI/CD pipelines, infrastructure as code, and fostering a reliability culture within the engineering team.
Top Skills:
AWSAzureCi/CdGCPKubernetesTerraform
Artificial Intelligence • Software
As a Site Reliability Engineer at Mercor, you will ensure production reliability, develop SRE function, and collaborate with engineering teams to maintain system performance.
Top Skills:
AWSKubernetesSpaceliftTerraform
Artificial Intelligence • Machine Learning • Robotics • Software • Transportation • Design • Manufacturing
The Site Reliability Engineer at Zoox will manage the availability and resilience of services for autonomous vehicles, design systems, and lead incident resolution.
Top Skills:
AnsibleAWSAzureC++CloudFormationGCPGoJavaKubernetesPythonSaltTerraform
Artificial Intelligence • Information Technology • Software
The role involves defining and evolving technical foundations for AI evaluation, optimizing performance, designing resilient systems, and collaborating with various teams for infrastructure improvements.
Top Skills:
Node.jsPostgresServerless EnvironmentsTypescript
Aerospace • Artificial Intelligence
The Site Reliability Engineer will architect and manage ground infrastructure for satellite systems, ensuring high availability, automating deployments, and optimizing data management systems.
Top Skills:
AnsibleAWSAzureC++CloudFormationEksElkGCPGrafanaHelmKubernetesPrometheusPythonTerraform
Consumer Web • eCommerce • Fashion • Retail
The Senior Site Reliability Engineer ensures the health of systems, automates processes, and collaborates on architecture to maintain uptime and reliability in production environments.
Top Skills:
AnsibleAWSAzureDatadogDockerElasticsearchGCPGraphiteHaproxyJavaScriptJenkinsKubernetesMongoDBNagiosNew RelicNginxNode.jsRabbitMQRedisRubyTerraformTomcat
Consumer Web • eCommerce • Fashion • Retail
The Software Engineer, SRE will develop, deploy, and support new product features while ensuring operational excellence and quality support in a fast-paced environment.
Top Skills:
AWSDockerElasticsearchHaproxyJavaScriptKubernetesMongoDBNginxNode.jsRabbitMQRedisRubyTomcat
Cloud
The role involves building and managing observability infrastructure in GCP, automating deployments, and optimizing data processes for high reliability.
Top Skills:
GkeGoGCPGrafanaKubernetesOpentelemetryPythonRubySplunkTerraform
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Artificial Intelligence • Machine Learning • Generative AI
As a Site Reliability Engineer, you will manage Kubernetes clusters, automate infrastructure, improve operational metrics, and enhance reliability across data centers.
Top Skills:
CloudFormationGoGpuKubernetesLinuxPythonTerraform
Artificial Intelligence • Information Technology • Robotics • Software
As a Senior Reliability and Test Engineer, you will identify failure risks, develop new reliability tests, ensure regulatory compliance, and conduct testing on home robotics.
Top Skills:
Certification StandardsReliability TestingStatistical Modeling
Artificial Intelligence • Fintech • Machine Learning • Social Impact • Software
Lead technical direction for software architecture and cross-team initiatives focusing on scaling consumer-facing systems and maximizing loan originations while maintaining compliance and system integrity.
Top Skills:
AWSCi/CdDockerGithub ActionsInfrastructure As CodeReactRuby On Rails
Fintech • Software
The Senior Site Reliability Engineer ensures fast, stable SaaS products through automation, collaboration, monitoring, and implementing AI tools to enhance performance and reliability.
Top Skills:
Ai ToolsAnsibleAppdynamicsAWSAzureAzure DevopsBashC# .NetCosmosDatadogDynatraceHarnessJavaJenkinsKubernetesNew RelicPowershellPythonSaaSSQLTerraform
Fintech • Financial Services
The Systems Reliability Engineer will support MEMX exchange platforms, handling incidents, improving processes, documenting actions, and debugging issues while collaborating with diverse teams to maintain operational efficiency.
Top Skills:
AnsibleBashChefLinuxPuppetPython
Software
As a Staff Site Reliability Engineer, you'll lead reliability strategies, design scalable systems, improve observability, and mentor engineers to enhance system performance and resilience.
Top Skills:
AWSDatadogGrafanaPrometheusTerraform
Software
As a Senior Site Reliability Engineer, you will ensure the reliability and scalability of production systems, improve system performance, and enhance observability through design and automation.
Top Skills:
AWSCloudwatchDatadogGrafanaPrometheusTerraform
Artificial Intelligence • Healthtech
The Site Reliability Engineer will enhance system reliability, define observability standards, respond to incidents, and collaborate with engineering teams on performance and compliance improvements.
Top Skills:
AWSContainerized ServicesDistributed WorkflowsObservability ToolingPostgresServerless Compute
Fintech • Professional Services • Software
As a Senior Site Reliability Engineer, you'll design scalable systems on AWS, mentor engineers, manage incident responses, and enhance the reliability of fintech infrastructure.
Top Skills:
SparkAWSDevOpsJavaKubernetesTerraform
Software
The Site Reliability Engineer will enhance reliability, observability, and incident response of You.com's production services, while collaborating with teams to implement best practices and improve operational efficiency through tooling and automation.
Top Skills:
AWSBashCi/CdEksGhaGitGitGrafanaOpentelemetryPrometheusPythonTerraform
Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
The Site Reliability Engineer will ensure the reliability and performance of AI infrastructure, build core systems, handle incident response, and develop automation tools.
Top Skills:
AWSDatadogElkGCPGithub ActionsGitlab CiGoGrafanaJenkinsKubernetesLinuxPrometheusPulumiPythonRustTerraform
Artificial Intelligence • Machine Learning • Generative AI
The Site Reliability Engineer will manage production infrastructure, focusing on data-heavy systems and improving reliability across services, particularly using ClickHouse and Kafka.
Top Skills:
ClickhouseCloud InfrastructureKafkaKubernetesSnowflakeTerraform
Software
The Senior Site Reliability Engineer will lead service onboarding, maintain SLAs/SLOs, design secure infrastructure, automate operational tasks, and respond to incidents while ensuring system reliability and performance.
Top Skills:
AWSCloudFormationElk StackGoGrafanaHadoopKubernetesPythonTerraform
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top San Francisco Companies Hiring Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results

















.png)











