Top Reliability Engineer Jobs in San Francisco, CA

Reposted 17 Days AgoSaved
Easy Apply
Remote
San Francisco Bay Area, CA
Easy Apply
167K-231K Annually
Senior level
167K-231K Annually
Senior level
Artificial Intelligence • Fintech • Machine Learning • Social Impact • Software
Lead technical direction for software architecture and cross-team initiatives focusing on scaling consumer-facing systems and maximizing loan originations while maintaining compliance and system integrity.
Top Skills: AWSCi/CdDockerGithub ActionsInfrastructure As CodeReactRuby On Rails
23 Days AgoSaved
In-Office
San Francisco Bay Area, CA
150K-180K Annually
Senior level
150K-180K Annually
Senior level
Hardware • Healthtech • Machine Learning • Software
Lead reliability engineering for electromechanical systems, including testing and validation of hardware to ensure performance and durability.
Top Skills: Accelerated Life TestingFmeaHalt/HassJmpLabviewMatlabPythonSpcWeibull Analysis
Reposted 21 Hours AgoSaved
In-Office
San Francisco Bay Area, CA
Senior level
Senior level
Artificial Intelligence • Healthtech
The Site Reliability Engineer will enhance system reliability, define observability standards, respond to incidents, and collaborate with engineering teams on performance and compliance improvements.
Top Skills: AWSContainerized ServicesDistributed WorkflowsObservability ToolingPostgresServerless Compute
Reposted 23 Days AgoSaved
In-Office
San Francisco Bay Area, CA
180K-230K Annually
Senior level
180K-230K Annually
Senior level
Energy • Renewable Energy
The Staff Reliability Engineer will ensure hardware reliability in high-voltage electronics, develop reliability test programs, and collaborate on design and testing across teams.
Top Skills: Hv ElectronicsPower ConversionPython
Reposted YesterdaySaved
In-Office
San Francisco Bay Area, CA
175K-250K Annually
Mid level
175K-250K Annually
Mid level
Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
The Site Reliability Engineer will ensure the reliability and performance of AI infrastructure, build core systems, handle incident response, and develop automation tools.
Top Skills: AWSDatadogElkGCPGithub ActionsGitlab CiGoGrafanaJenkinsKubernetesLinuxPrometheusPulumiPythonRustTerraform
Reposted YesterdaySaved
In-Office
San Francisco Bay Area, CA
195K-240K Annually
Senior level
195K-240K Annually
Senior level
Software
The Site Reliability Engineer will enhance reliability, observability, and incident response of You.com's production services, while collaborating with teams to implement best practices and improve operational efficiency through tooling and automation.
Top Skills: AWSBashCi/CdEksGhaGitGitGrafanaOpentelemetryPrometheusPythonTerraform
Reposted 24 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
180K-279K Annually
Senior level
180K-279K Annually
Senior level
Fintech • Financial Services
The Staff Infrastructure Reliability Engineer leads Redfin's production database and storage systems, collaborating on strategies for reliability, scalability, and performance, while mentoring engineers and guiding complex technical discussions.
Top Skills: AWSAws AuroraAws RdsAws S3DynamoDBElasticacheOpensearchPostgresPythonRdbms
Reposted 2 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
163K-203K Annually
Senior level
163K-203K Annually
Senior level
Fintech
As a Senior Site Reliability Engineer, you will ensure the reliability, scalability, and security of Prosper's Cloud Platform while designing AI-assisted operations and mentoring junior engineers.
Top Skills: ApmCi/CdCloudInfrastructure As CodeKubernetes
3 Days AgoSaved
In-Office
San Francisco Bay Area, CA
117K-209K Annually
Senior level
117K-209K Annually
Senior level
Big Data • Cloud • Digital Media • Machine Learning • Mobile • Software • Industrial
Lead reliability for Autodesk GovCloud services by deploying, operating, and automating production systems. Define SLOs/SLIs, build observability and automation, run incident response and on-call rotation, ensure compliance (FedRAMP), perform resilience testing and toil reduction, and collaborate across engineering, security, and platform teams to improve service reliability and operability.
Top Skills: APIsAWSAws GovcloudAzureBashCaching TechnologiesCi/CdCloudwatchContainersDatabasesDatadogDnsDynatraceFedrampGoIl4Il5Infrastructure As CodeJavaKubernetesLoad BalancingMessaging SystemsNetworkingPowershellPythonSplunkStorage Platforms
3 Days AgoSaved
In-Office or Remote
San Francisco Bay Area, CA
114K-235K Annually
Mid level
114K-235K Annually
Mid level
Social Media
Operate, scale, and improve a cloud-native platform on AWS and Kubernetes. Manage GitOps deployments with ArgoCD and Helm, provision infra with Terraform/Terragrunt, build CI/CD automation, enhance observability, respond to incidents, reduce operational toil through scripting, and collaborate with security and application teams to improve reliability and platform guardrails.
Top Skills: ArgocdAWSBashContainersEksGithub ActionsGitopsHelmIamKubernetesLinuxPythonTerraformTerragrunt
Reposted 16 Days AgoSaved
Remote
San Francisco Bay Area, CA
145K-180K Annually
Senior level
145K-180K Annually
Senior level
Legal Tech • Software
Lead automation and optimization of Filevine's data platform: performance tune MSSQL/Postgres, optimize Snowflake, provision infrastructure with Terraform/AWS, run stateful containers on Kubernetes, integrate AI/LLM and MCP for operational automation, manage CI/CD, capacity planning, documentation, and serve in 24/7 on-call rotation.
Top Skills: AWSC#DapperDockerDynamoDBEntity FrameworkGitlabKubernetesLlmsMcp (Model Context Protocol)Microsoft Sql Server (Mssql)Octopus DeployOpensearchPostgresPowershellPythonRedisSnowflakeTerraform
Reposted 3 Days AgoSaved
In-Office
San Francisco Bay Area, CA
350K-475K Annually
Mid level
350K-475K Annually
Mid level
Artificial Intelligence • Information Technology
The Site Reliability Engineer will drive reliability for the Tinker platform, focusing on incident response, monitoring, and ensuring system resilience while collaborating across teams.
Top Skills: Cloud InfrastructureKubernetes
New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free
Application Tracker Preview
Reposted 3 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
182K-225K Annually
Senior level
182K-225K Annually
Senior level
Fintech • Software
As a Senior Site Reliability Engineer, you'll build and scale internal platform offerings, design monitoring systems, and collaborate with software engineers to ensure application performance and reliability.
Top Skills: AnsibleAWSCloudFormationDatadogDockerElk StackGrafanaGrpcJavaKubernetesPostgresPrometheusPythonTerraform
Reposted 4 Days AgoSaved
In-Office
San Francisco Bay Area, CA
194K-267K Annually
Senior level
194K-267K Annually
Senior level
Cloud
The role involves building and managing observability infrastructure in GCP, automating deployments, and optimizing data processes for high reliability.
Top Skills: GkeGoGCPGrafanaKubernetesOpentelemetryPythonRubySplunkTerraform
4 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
Senior level
Senior level
Artificial Intelligence • Software
Design, build, and scale control- and data-plane infrastructure for distributed AI workloads. Improve reliability, performance, scheduling, and observability for Ray clusters across cloud and on-prem environments. Support accelerator integration, container image management, and provide on-call troubleshooting and cross-team collaboration.
Top Skills: AWSAzureContainersGCPGoGpusGrafanaKubernetesLinuxPrometheusPythonRayTpusVms
Reposted 17 Days AgoSaved
Remote
San Francisco Bay Area, CA
Senior level
Senior level
Software
As a Senior DevOps / Platform Reliability Engineer, you will manage CI/CD pipelines, automate infrastructure, operate Kubernetes, and enhance observability while ensuring security and compliance for enterprise systems.
Top Skills: Argo CdAurora MysqlAWSBashCloudFormationEksElasticacheGithub ActionsGrafanaKubernetesLinuxMskOpentelemetryPrometheusPythonS3Terraform
Reposted 4 Days AgoSaved
In-Office
San Francisco Bay Area, CA
200K-350K Annually
Senior level
200K-350K Annually
Senior level
Artificial Intelligence
The SRE/Infrastructure Engineer will manage Terraform and Kubernetes across cloud platforms, ensuring scalable infrastructure. Responsibilities include multi-cloud deployments, observability, and creating reusable components.
Top Skills: AWSAzureCloudflareGCPKubernetesTerraform
Reposted 4 Days AgoSaved
In-Office
San Francisco Bay Area, CA
251K-336K Annually
Senior level
251K-336K Annually
Senior level
Digital Media • Gaming • News + Entertainment • Sports
As a Sr Principal Site Reliability Engineer, you will ensure maximum platform availability, lead incident response processes, drive automation, and collaborate across teams to optimize system performance and operational efficiency.
Top Skills: Automation ToolsCloud TechnologiesContent Delivery NetworksMedia Streaming TechnologiesMonitoring Tools
Reposted 4 Days AgoSaved
In-Office
San Francisco Bay Area, CA
159K-230K Annually
Senior level
159K-230K Annually
Senior level
Artificial Intelligence • Big Data • Machine Learning • Software
The role involves designing and implementing custom installations of the C3 AI Platform for Federal customers, ensuring uptime, and automating system processes while collaborating with cross-functional teams.
Top Skills: AnsibleAWSAzureBashKubernetesLinuxPuppetPythonRubyTerraform
Reposted 4 Days AgoSaved
Hybrid
San Francisco Bay Area, CA
175K-225K Annually
Senior level
175K-225K Annually
Senior level
Artificial Intelligence • Machine Learning • Database
The role involves ensuring the reliability and performance of distributed database systems, developing monitoring strategies, and automating operations in a cloud-native environment.
Top Skills: AnsibleArgoAWSAzureDockerGCPGitlab CiGoJavaJenkinsKubernetesPythonTerraform
5 Days AgoSaved
In-Office
San Francisco Bay Area, CA
Senior level
Senior level
Artificial Intelligence • Healthtech • Software • Automation
Design, build, and operate Optura's multi-cloud, HIPAA-aware platform: run Kubernetes across cloud and customer on-prem/air-gapped environments, create unified deployment tooling (Helm/operators/GitOps), own SLOs/capacity/incident response, drive reliability, implement identity/networking/security controls, and build IaC/GitOps patterns in partnership with product and security teams.
Top Skills: AksArgo CdAWSAzureBackstageCluster ApiCrossplaneDistributed TracingEksGCPGitopsGkeGoGrafanaHelmKmsKubernetesMtlsOidcOpenshiftOpentelemetryOperatorsPrometheusPulumiPythonRancherReplicatedSecrets ManagementService MeshTalosTerraformVpc
Reposted 23 Days AgoSaved
Easy Apply
Remote or Hybrid
San Francisco Bay Area, CA
Easy Apply
126K-248K Annually
Senior level
126K-248K Annually
Senior level
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will develop and support distributed storage services, ensuring reliability and operational safety, with a focus on automation and efficiency.
Top Skills: AWSAzureDnsGoGoogle Cloud PlatformKubernetesLinuxPythonTcp/IpTls
Reposted 23 Days AgoSaved
Easy Apply
Remote or Hybrid
San Francisco Bay Area, CA
Easy Apply
127K-249K Annually
Expert/Leader
127K-249K Annually
Expert/Leader
Big Data • Cloud • Software • Database
Seeking a Site Reliability Engineer with expertise in networking and distributed systems for building secure multi-cloud infrastructure. Responsibilities include maintaining network architecture and ensuring reliable service-to-service communication, involving a 24/7 on-call rotation.
Top Skills: AWSAzureBgpDnsGCPIpv6KubernetesLoad BalancingMtlsService MeshTcp/IpTlsVpcsVpns
Reposted 5 Days AgoSaved
In-Office
San Francisco Bay Area, CA
230K-490K Annually
Mid level
230K-490K Annually
Mid level
Artificial Intelligence • Machine Learning • Generative AI
The Software Engineer in Reliability will ensure system scalability, reliability, and performance, collaborating with teams to improve infrastructure and handle incidents.
Top Skills: Cloud InfrastructureCloudFormationContainer Orchestration PlatformsContainerization TechnologiesDatadogGrafanaIac ToolsKubernetesMicroservices ArchitectureObservability ToolsProgramming LanguagesPrometheusService Mesh TechnologiesSplunkTerraform
Reposted 5 Days AgoSaved
In-Office
San Francisco Bay Area, CA
200K-275K Annually
Senior level
200K-275K Annually
Senior level
Artificial Intelligence • Healthtech • Information Technology • Software
As a Site Reliability Engineer, you will manage the production environment, focusing on infrastructure design, automation, and optimizing deployment pipelines to ensure high availability.
Top Skills: HelmKafkaKubernetesPostgresPythonRedisTerraformTypescript
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account