Get the job you really want.
Maximum of 25 job preferences reached.
Top Reliability Engineer Jobs in San Francisco, CA
Artificial Intelligence • Fintech • Machine Learning • Natural Language Processing • Payments • Software • Financial Services
The Lead Site Reliability Engineer will drive reliability strategies, architect and maintain infrastructure, lead incident responses, and influence engineering practices for operational excellence while mentoring team members.
Top Skills:
AWSDockerFastapiKubernetesPostgresPythonTypescriptVue
Artificial Intelligence • Software • Conversational AI • Generative AI
As a Staff Software Engineer in Site Reliability, you'll maintain production services, develop automation tools, and collaborate with teams to ensure system reliability and performance.
Top Skills:
Ci/CdGCPGoKubernetesLinuxPythonSQLTerraform
Reposted 25 Days AgoSaved
Easy Apply
Easy Apply
Energy
As a Reliability Engineer, you will define system reliability requirements, conduct tests, analyze failures, and collaborate on design improvements for hardware products.
Top Skills:
JmpMinitabPython
Software
This role involves embedding with strategic customers to deliver technical solutions, manage AI/ML infrastructure, and ensure optimum performance of Kubernetes clusters.
Top Skills:
ArgocdCi/CdFluentbitGoGrafanaHelmKubernetesPrometheusPython
Information Technology • Mobile • Software
As a Site Reliability Engineer, you'll ensure system reliability and scalability, automate processes, optimize performance, and collaborate on system design.
Top Skills:
AWSAzureBashCloudFormationDatadogDockerElkGoGoogle Cloud PlatformGrafanaHelmKubernetesNew RelicPrometheusPulumiPythonTerraform
Artificial Intelligence • Software
As a Site Reliability Engineer at Anyscale, you will ensure smooth operations of user-facing services, develop monitoring and alerting systems, implement incident management processes, and improve cloud service deployment methodologies.
Top Skills:
Alerting SystemsAutomationCloud ComponentsIncident ManagementMonitoring Systems
Edtech
The Senior Site Reliability Engineer will design automation tools, ensure uptime, manage Kubernetes clusters, optimize CI/CD systems, and collaborate on service resilience.
Top Skills:
ArgocdCircleCIDatadogGithub ActionsGoIstioKubernetesLinuxPythonTerraform
Edtech
As a Staff Site Reliability Engineer, you will lead reliability engineering by designing automation, scaling systems, driving architectural improvements, and mentoring engineers.
Top Skills:
ArgocdCi/CdCircleCIDatadogGCPGithub ActionsGoIstioKubernetesPythonTerraform
Edtech
Lead the technical vision for reliability at Quizlet by architecting self-healing systems, mentoring engineers, and improving infrastructure resilience.
Top Skills:
Ci/CdDatadogGoIstioJeliKubernetes (Gke)PythonTerraform
Artificial Intelligence • Software
As a Senior SRE, you'll enhance data infrastructure, optimize performance, build reliability, automate processes, and manage incident responses while supporting enterprise clients' uptime requirements.
Top Skills:
ClickhouseGoPostgresPythonTypescript
Artificial Intelligence • Software
As Staff SRE Tech Lead, you'll oversee platform reliability and scalability, lead the SRE team, architect data infrastructures, and optimize systems while implementing automation and observability practices.
Top Skills:
ClickhouseGoPostgresPythonTypescript
Blockchain • Software • Cryptocurrency • Web3
Responsible for ensuring reliability and scalability of systems, maintaining AWS/GCP infrastructure, deploying applications, and improving operational processes.
Top Skills:
AnsibleAWSDnsElkGCPHTTPHttpsJenkinsKubernetesLinuxPrometheusPuppetTcpTerraformUdp
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Fintech • Software
The SRE is responsible for building cloud-native platforms, improving application reliability, and fostering collaboration within teams.
Top Skills:
Ci/CdKubernetesOpenshiftOpenstackPrometheusSplunkVMware
Fintech
The Principal Site Reliability Engineer designs and implements software to enhance application performance and resilience while ensuring security standards. Responsibilities include automating application management, providing observability, and leading cross-functional teams. Mentorship and on-call rotation participation are expected.
Top Skills:
AuroraAWSChefDockerDynamo DbGitGoJavaJenkinsJmsKafkaKubernetesMavenMemcachedOraclePythonRedisSqsSwarm
Artificial Intelligence • Healthtech • Other • Productivity • Telehealth • Conversational AI • Generative AI
As a Founding Site Reliability Engineer, you'll enhance system reliability, automate tasks, manage incidents, and mentor others. You'll shape infrastructure and ensure optimal performance and stability for healthcare services.
Top Skills:
AWSAzureDatadogGCPGrafanaHoneycombKubernetesOpentelemetryPagerdutyPrometheusSentryTerraform
Artificial Intelligence • Marketing Tech • Software • Big Data Analytics
The Senior Site Reliability Engineer will design and maintain scalable infrastructure, improve system reliability, manage CI/CD pipelines, and collaborate across teams for operational excellence.
Top Skills:
AnsibleArgocdAWSBashDatadogDockerElkGithub ActionsGrafanaKubernetesLinuxOpentelemetryPrometheusPythonTerraform
Information Technology • Software • Big Data Analytics
The Site Reliability Engineer will design, analyze, and troubleshoot large-scale distributed systems, focusing on operating systems and performance tuning.
Top Skills:
ApacheJava
Software
Join Lambda to scale their high-performance cloud network. Responsibilities include automating network deployments, managing SDNs, and ensuring network availability.
Top Skills:
AnsibleCi/CdGitHelmKubernetesNeutronOpenstackOvnPythonTerraform
Reposted 5 Days AgoSaved
Information Technology
Lead Observability Engineer responsible for defining and implementing observability strategies, tools, and patterns to ensure reliable performance across various systems at Vivun.
Top Skills:
CeleryDatadogGrafanaHoneycombLangchainNode.jsObserveOpenai ApisOpentelemetryPrometheusPython
Semiconductor • Energy
The Reliability Test Engineer will design and commission custom test rigs, perform various accelerated life tests, develop dashboards, and analyze test data for thermal systems.
Top Skills:
Data Acquisition SystemsElectrical Testing EquipmentHmiMultimetersOscilloscopesPlc ProgrammingPower MetersProcess SensorsPythonScadaThermocouples
Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3
The Senior Site Reliability Engineer manages production infrastructure, ensuring performance and reliability using AI tools, Kubernetes, and CI/CD pipelines while mentoring teams.
Top Skills:
Apache AirflowAWSAws LambdaAzureChatgptCi/CdCrossplaneGCPGeminiGithub CopilotGoKubernetesOpensearchPostgresPythonRedisSnowflakeTerraform
Healthtech • Insurance
The Senior Software Engineer will lead complex projects, mentor engineers, and ensure cloud infrastructure is resilient and automated. Responsibilities include developing software, managing production environments, and enforcing coding standards.
Top Skills:
ArgocdAWSGCPGithub ActionsGrafanaIstioKubernetesPrometheusTerraform
Financial Services
As a Site Reliability Engineer, you will enhance and monitor production systems, automate workflows, and respond to incidents to maintain system reliability.
Top Skills:
AirflowBazelGitGoGrafanaGrpcJenkinsKubernetesLinuxPandasPostgresPrometheusPythonRRelational DatabasesSQL
Fintech • Payments
The Senior Staff SRE leads reliability engineering initiatives, drives operational excellence, mentors staff, and influences architecture to enhance system reliability and performance.
Top Skills:
Ai/MlAWSAzureDockerElk StackGCPGrafanaKubernetesMySQLNoSQLPostgresSplunk
Healthtech • Software
The Database Reliability Engineer manages and maintains cloud-based database infrastructures for SaaS applications, focusing on automation, process improvement, and collaboration with engineering teams.
Top Skills:
AnsibleAWSAzureAzure Data FactoryC#DatabricksGCPGitGrafanaInfluxdbMySQLPostgresPowershellPythonSQLSQL ServerTerraform
Popular Job Searches
All Filters
Total selected ()
No Results
No Results






























