Get the job you really want.
Maximum of 25 job preferences reached.
Top Reliability Engineer Jobs in San Francisco, CA
Artificial Intelligence • Software
As a Senior/Staff Network Reliability Engineer, you'll optimize and maintain Fluidstack's network platform, ensuring performance and reliability for AI and HPC workloads. Responsibilities include tuning networking protocols, deploying and validating switches, automating telemetry, conducting root-cause analyses, and collaborating with vendors.
Top Skills:
BgpDpdkEbpfEvpnGeneveGoPythonRdmaRustTcp/IpVxlanXdp
Reposted 12 Days AgoSaved
Easy Apply
Easy Apply
Energy
As a Reliability Engineer, you will define reliability requirements, analyze design failures, run tests, and ensure high standards for hardware products.
Top Skills:
JmpMinitabPython
Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Software
The AWS Cloud Architect will design, build, and optimize cloud infrastructure, ensuring scalability and security while mentoring junior SREs and defining cloud strategy.
Top Skills:
AnsibleAws Api GatewayAws CloudfrontAws CloudtrailAws CloudwatchAws DocumentdbAws Ec2Aws EksAws LambdaAws RdsAws S3Aws Secrets ManagerAws SsmDockerGrafanaHashicorp ConsulHashicorp TerraformHashicorp VaultKubernetesNew RelicPrometheus
Artificial Intelligence • Fintech • Machine Learning • Social Impact • Software
As a Senior Software Engineer focused on Site Reliability Tooling, you'll enhance system reliability, implement SRE practices, and build automation tools to support site reliability across Upstart's infrastructure.
Top Skills:
CdkCloudFormationDatadogGoJavaScriptKubernetesPrometheusPythonTerraformTypescript
Industrial • Manufacturing
Lead reliability strategy for HVAC components, develop predictive life models, design tests, and improve product durability. Analyze data and mentor junior engineers.
Top Skills:
Accelerated Test MethodsHvac SystemsIot DevicesPredictive Life ModelsWeibull Modeling
Appliances
Lead reliability strategies for HVAC components, develop predictive models, conduct tests, analyze data, and mentor junior engineers.
Top Skills:
Accelerated Life TestingCorrosion TestingEnvironmental ChambersHvac SystemsReliability EngineeringVibration TablesWeibull Analysis
Sales • Software • Automation
Join the Infrastructure Team to build and maintain critical systems, automating database lifecycles and enhancing disaster recovery with a focus on resilience and simplicity.
Top Skills:
AnsibleArgocdAWSClickhouseDockerElasticsearchFlaskGithub ActionsGrafanaKubernetesMongoDBPostgresPythonRedisTerraform
Marketing Tech
The Cloud Reliability Engineer develops, configures, and deploys cloud tools, enhances applications, ensures observability, and participates in on-call rotations.
Top Skills:
AWSCi/CdDockerGithub ActionsGoGoogle BigqueryGCPKubernetesLinuxPythonSQLTerraform
Artificial Intelligence • Healthtech • Software
As a Site Reliability Engineer, you will manage cloud infrastructure, implement observability, and ensure system reliability by collaborating with engineering teams and maintaining databases.
Top Skills:
AzureBashGitGitKubernetesPostgresPythonRedisSQLTypescriptVscode
Reposted 10 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
This role involves building and maintaining observability services, ensuring service reliability, and collaborating with other teams on best practices.
Top Skills:
AWSFluentbitGCPJaegerKubernetesAzureQuickwitSplunkVectorVictoriametrics
Information Technology • Security • Cybersecurity
Responsible for managing Oracle RAC databases, optimizing performance, ensuring security and integrity, and providing 24x7 support for production applications.
Top Skills:
CassandraCephElasticsearchKafkaOracleRedis
Cloud
The role involves designing and optimizing PostgreSQL clusters, automating database tasks, and ensuring high availability and performance while collaborating with other engineering teams.
Top Skills:
AnsibleDatadogGoGrafanaKubernetesMySQLPostgresPrometheusPythonTerraform
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Artificial Intelligence • Cybersecurity
The Database Reliability Engineer will ensure database availability, performance, scalability, and security across AWS, collaborating with application and security teams.
Top Skills:
AWSCrossplaneDatadogGitlab Ci/CdKubernetesNoSQLOpensearchPostgresTerraform
Artificial Intelligence • Machine Learning • Software
As a Staff Site Reliability Engineer, you will enhance the reliability, scalability, and performance of production services by applying SRE principles, implementing observability practices, automating processes, and collaborating with engineering teams.
Top Skills:
AWSAzureCloudFormationDatadogDockerElk StackGCPGoGrafanaJaegerKubernetesOpentelemetryOpentofuPrometheusPythonTerraform
eCommerce • Legal Tech • Professional Services • Software • Data Privacy
The Site Reliability Engineer will ensure systems run smoothly, work with automation tools, resolve issues, and drive operational improvements.
Top Skills:
AWSAzureCloudFormationDockerGCPGrafanaKubernetesMemcachedNew RelicOpentelemetryPostgresPrometheusPulumiRedisSentryTerraform
Aerospace • Hardware • Logistics • Robotics • Software • Transportation
Design for Reliability Engineer responsible for ensuring the safety and reliability of drone-delivery systems through testing, statistical analysis, and innovative design solutions.
Top Skills:
JmpMatlabMinitabPythonReliasoft
Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3
The Site Reliability Engineer will build and maintain infrastructure, improve software systems, develop scalable microservices, and ensure quality software delivery.
Top Skills:
AWSGoGoogle Cloud PlatformJavaKubernetesAzureSQL
Productivity
The Senior Site Reliability Engineer will enhance site reliability through monitoring, optimizing infrastructure, collaborating on engineering projects, and ensuring systems’ stability.
Top Skills:
AWSDockerKubernetesTemporal
Artificial Intelligence • Machine Learning • Generative AI
The Software Engineer in Reliability will ensure system scalability, reliability, and performance, collaborating with teams to improve infrastructure and handle incidents.
Top Skills:
Cloud InfrastructureCloudFormationContainer Orchestration PlatformsContainerization TechnologiesDatadogGrafanaIac ToolsKubernetesMicroservices ArchitectureObservability ToolsProgramming LanguagesPrometheusService Mesh TechnologiesSplunkTerraform
Software • Generative AI
As a Site Reliability Engineer at Fireworks AI, you'll ensure system reliability, manage incidents, develop monitoring solutions, and reduce operational toil, while collaborating with software engineers to embed reliability in the development lifecycle.
Top Skills:
AWSAzureDockerElk StackGCPGoGrafanaKubernetesPrometheusPython
Artificial Intelligence • Machine Learning • Database
The role involves ensuring the reliability and performance of distributed database systems, developing monitoring strategies, and automating operations in a cloud-native environment.
Top Skills:
AnsibleArgoAWSAzureDockerGCPGitlab CiGoJavaJenkinsKubernetesPythonTerraform
Artificial Intelligence • Information Technology
As a Site Reliability Engineer, maintain user-facing services, implement best practices for reliability, and manage production incidents.
Top Skills:
AnsibleCloud ServicesKubernetesProgramming LanguagesTerraform
Reposted 20 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will support, maintain and grow the Atlas platform, focusing on automating processes and running multi-cloud environments.
Top Skills:
AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls
Consumer Web • Mobile
As a Site Reliability Engineer at Patreon, you'll improve AWS infrastructure, implement SRE practices, enhance Kubernetes capabilities, and develop automation tools.
Top Skills:
AnsibleAWSChefKubernetesPuppetPythonTerraform
Artificial Intelligence • Healthtech • Information Technology • Software
As a Site Reliability Engineer, you will manage the production environment, focusing on infrastructure design, automation, and optimizing deployment pipelines to ensure high availability.
Top Skills:
HelmKafkaKubernetesPostgresPythonRedisTerraformTypescript
Top San Francisco Companies Hiring Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results































