Get the job you really want

Top Reliability Engineer Jobs in San Francisco, CA

38+ Job Results
8 Days Ago
San Francisco, CA
Remote
1,050 Employees
5-7 Years of Experience
1,050 Employees
5-7 Years of Experience
Cloud • Software
Seeking a Senior Site Reliability Engineer with expertise in designing and operating large-scale distributed systems in the cloud, with a focus on FedRAMP-compliant infrastructure. Responsibilities include collaborating with software engineers, designing and managing infrastructure, ensuring compliance with FedRAMP controls, driving automation, and maintaining cloud-native services on AWS.
Top Benefits:
401-K
401-K Matching
Adoption Assistance
+52 More
9 Days Ago
San Francisco, CA
Remote
7,000 Employees
76K-144K Annually
3-5 Years of Experience
7,000 Employees
76K-144K Annually
3-5 Years of Experience
Artificial Intelligence • Cloud • Productivity • Software
Join RingCentral as a Senior Site Reliability Engineer responsible for maintaining and operating monitoring systems and infrastructure. Collaborate with teams, ensure system reliability, and participate in incident resolution. Must-have skills include Linux, cloud platforms, and configuration management tools.
Top Benefits:
401-K
401-K Matching
Child Care Benefits
+54 More
8 Days Ago
San Francisco, CA
Remote
4,000 Employees
114K-175K Annually
3-5 Years of Experience
4,000 Employees
114K-175K Annually
3-5 Years of Experience
Artificial Intelligence • Fintech • Hardware • Information Technology • Sales • Software • Transportation
As a Site Reliability Engineer, you will design, scale, and manage AWS-backed infrastructure, automate system provisioning, ensure high availability, and improve monitoring capabilities. Requires 4+ years of SRE/DevOps experience with expertise in AWS services, container orchestration, and scripting languages. Compensation range: $114,000—$175,000 USD.
Top Benefits:
401-K
401-K Matching
Commuter Benefits
+37 More
11 Days Ago
San Francisco, CA
Remote
9,500 Employees
144K-231K Annually
7+ Years of Experience
9,500 Employees
144K-231K Annually
7+ Years of Experience
Cloud • Information Technology • Productivity • Security • Software
As a Senior Site Reliability Engineer at Atlassian, you will be responsible for improving the performance and reliability of services, addressing root causes of incidents, and automating repetitive tasks. You will collaborate with the team to develop innovative solutions and ensure high code quality, operating at scale in Amazon Web Services. Strong skills in Bash, Python, Linux, AWS, Ansible, Docker, Kubernetes, and ITIL are required.
Top Benefits:
401-K
401-K Matching
Adoption Assistance
+58 More
10 Days Ago
San Francisco, CA
1,700 Employees
130K-280K Annually
1-3 Years of Experience
1,700 Employees
130K-280K Annually
1-3 Years of Experience
Cloud • Hardware • Security • Software
Join Verkada as a Site Reliability Engineer - Infrastructure and be responsible for managing and scaling the infrastructure. You will work on optimizing cost efficiency, enforcing security requirements, improving monitoring and alerting, and adopting new technologies. This role requires a solid understanding of scripting languages, experience with cloud platforms like AWS, Kubernetes, and Terraform.
Top Benefits:
401-K
Commuter Benefits
Company Equity
+45 More
12 Days Ago
San Francisco, CA
Hybrid
1,050 Employees
7+ Years of Experience
1,050 Employees
7+ Years of Experience
Cloud • Software
As a Principal Site Reliability Engineer, you will focus on innovating and providing strong technical vision for our platform's mission-critical datastores. You will build reliable, scalable, and highly available datastores on a multi-region scale platform. You will collaborate with leaders across the company as a subject matter expert and be a role model for the engineering team.
Top Benefits:
401-K
401-K Matching
Adoption Assistance
+52 More
13 Days Ago
San Francisco, CA
26 Employees
143K-231K Annually
5-7 Years of Experience
26 Employees
143K-231K Annually
5-7 Years of Experience
Enterprise Web • Software
As a Site Reliability Engineer at Orb, you will play a critical role in maintaining and scaling our robust infrastructure, ensuring stability, scalability, and performance. You will be at the heart of tackling some of the most significant engineering challenges, from scaling our data ingestion pipelines to refining our observability and reliability practices.
Top Benefits:
401-K
Commuter Benefits
Company Equity
+12 More
14 Days Ago
San Francisco, CA
3,000 Employees
112K-237K Annually
3-5 Years of Experience
3,000 Employees
112K-237K Annually
3-5 Years of Experience
Hardware • Information Technology • Security • Software • Cybersecurity • Generative AI
As a Senior Site Reliability Engineer at Cisco Meraki, you will be responsible for building highly scalable cloud infrastructure and supporting critical customer infrastructure. Your role will involve automating processes, deploying new technologies, and improving infrastructure efficiency. Collaboration with vendors, data center operations, and cross-functional teams is essential. This role requires 24/7 on-call support and technical project delivery. Ideal candidates will have experience in leading large technical projects and a strong background in Linux, automation, and infrastructure management.
Top Benefits:
401-K
401-K Matching
Adoption Assistance
+87 More
2 Days Ago
San Francisco, CA
2,500 Employees
143K-210K Annually
5-7 Years of Experience
2,500 Employees
143K-210K Annually
5-7 Years of Experience
Artificial Intelligence • Automotive • Machine Learning • Robotics • Transportation
Design and develop scalable and stable test systems and frameworks for a large-scale Hardware in the Loop (HIL) test platform at Cruise. Influence the technical roadmap, develop software platforms, tools for monitoring and reporting, and automate routine tasks. Mentor junior engineers and drive reliability and scalability improvements through automation.
Top Benefits:
401-K
401-K Matching
Commuter Benefits
+40 More
9 Days Ago
San Francisco, CA
Remote
20,000 Employees
7+ Years of Experience
20,000 Employees
7+ Years of Experience
Artificial Intelligence • Cloud • HR Tech • Information Technology • Software
Guide the technical design, implementation, and optimization of global infrastructure services primarily focused on Hybrid Cloud. Research and recommend new technology solutions, ensure reliability and redundancy, manage projects, automate operational activities, integrate systems technologies with ServiceNow platform, and ensure legal compliance.
Top Benefits:
401-K
401-K Matching
Adoption Assistance
+42 More
9 Days Ago
San Francisco, CA
Remote
20,000 Employees
143K-250K Annually
5-7 Years of Experience
20,000 Employees
143K-250K Annually
5-7 Years of Experience
Artificial Intelligence • Cloud • HR Tech • Information Technology • Software
The Staff SRE - Technical Duty Officer at ServiceNow supports and protects all of ServiceNow's public services, providing technical leadership for a team of on-site engineers responsible for the availability and performance of ServiceNow's cloud platform. This role involves coordinating recovery efforts and crisis management during major outages.
Top Benefits:
401-K
401-K Matching
Adoption Assistance
+42 More
14 Days Ago
San Francisco, CA
7,000 Employees
126K-180K Annually
5-7 Years of Experience
7,000 Employees
126K-180K Annually
5-7 Years of Experience
Artificial Intelligence • Cloud • Productivity • Software
RingCentral is seeking a Senior Site Reliability Engineer to work on infrastructure solutions, Docker infrastructure, automation, and deployment activities. Responsibilities include production support, research, development, IaaC with Terraform, CI/CD processes, and collaboration with teams.
Top Benefits:
401-K
401-K Matching
Child Care Benefits
+54 More
6 Days Ago
San Francisco, CA
Hybrid
145 Employees
130K-185K Annually
5-7 Years of Experience
145 Employees
130K-185K Annually
5-7 Years of Experience
Edtech
As a Sr. Database Reliability Engineer at Quizlet, you will be responsible for planning, managing, and scaling the data layer to be resilient and performant. You will collaborate with engineering teams on architectural decisions, advise on data modeling, and improve database reliability awareness. This hybrid role involves monitoring, supporting, and optimizing both RDBMS and streaming technologies.
2 Days Ago
San Francisco, CA
Remote
9,500 Employees
194K-312K Annually
7+ Years of Experience
9,500 Employees
194K-312K Annually
7+ Years of Experience
Cloud • Information Technology • Productivity • Security • Software
Looking for a reliability expert to join our growing SRE teams. Must have deep understanding of modern Cloud Infrastructure and operational best practices. Responsible for driving change across services and processes to improve reliability, performance, scalability, and cost efficiency. Proficiency in Java, Go, or Python is required. Remote-friendly opportunity.
Top Benefits:
401-K
401-K Matching
Adoption Assistance
+58 More
13 Days Ago
San Francisco, CA
Hybrid
2,344 Employees
191K-279K Annually
7+ Years of Experience
2,344 Employees
191K-279K Annually
7+ Years of Experience
Fintech • HR Tech
Design and implement production-grade systems, establish standards for automation, plan complex migrations, improve on-call experience, and lead technical roadmaps for system reliability and scalability.
Top Benefits:
401-K
Adoption Assistance
Commuter Benefits
+29 More
7 Days Ago
San Francisco, CA
165,000 Employees
3-5 Years of Experience
165,000 Employees
3-5 Years of Experience
Hardware • Retail • Software • Wearables
The Operations Reliability Engineer at Apple is responsible for guiding development and operations teams towards generating reliable designs for new technology components and products. They lead operational readiness activities, analyze test correlation, summarize reliability results, and drive improvements in reliability testing and product ramp.
8 Days Ago
San Francisco, CA
94 Employees
140K-190K Annually
7+ Years of Experience
94 Employees
140K-190K Annually
7+ Years of Experience
Greentech • Energy
Develop and lead the reliability program for new products at Span, including creating test plans, conducting testing, analyzing data, and managing a reliability team. Build and manage the in-house reliability lab and collaborate with internal and external teams for product testing. Requires 8+ years of experience in reliability testing and a BS or MS in Mechanical or Electrical Engineering.
15 Hours Ago
San Francisco, CA
165,000 Employees
3-5 Years of Experience
165,000 Employees
3-5 Years of Experience
Hardware • Retail • Software • Wearables
The Apple Service Engineering - Redis SRE team is looking for Site Reliability Engineers with experience in developing processes, tools, and automation for managing distributed systems in production environments.
13 Days Ago
San Francisco, CA
165,000 Employees
5-7 Years of Experience
165,000 Employees
5-7 Years of Experience
Hardware • Retail • Software • Wearables
Lead introduction and qualification of new products with high quality in the Silicon Technologies group at Apple. Drive product, process, and package qualification and reliability stress tests from design to completion. Must have strong communication skills and experience in semiconductor engineering and reliability standards.
8 Days Ago
San Francisco, CA
Remote
1,000 Employees
225K-284K Annually
7+ Years of Experience
1,000 Employees
225K-284K Annually
7+ Years of Experience
Artificial Intelligence • Information Technology • Machine Learning • Natural Language Processing • Productivity • Software • Generative AI
The Software Engineer in Reliability Engineering role at Grammarly involves building world-class, secure, and reliable cloud-native infrastructure solutions for Grammarly engineers. Responsibilities include improving incident management, introducing auto-scaling and resilience mechanisms, conducting chaos testing, and establishing best practices for reliability.
Top Benefits:
401-K
Commuter Benefits
Company Equity
+29 More
14 Days Ago
San Francisco, CA
165,000 Employees
3-5 Years of Experience
165,000 Employees
3-5 Years of Experience
Hardware • Retail • Software • Wearables
As a SoC Silicon Reliability Engineer, you will drive SoC product, process, and package qualification and reliability tests. You will be responsible for failure analysis during product or package qualification, applying quality standards, and implementing quality improvement processes with suppliers.
15 Days Ago
San Francisco, CA
Hybrid
145 Employees
150K-215K Annually
7+ Years of Experience
145 Employees
150K-215K Annually
7+ Years of Experience
Edtech
As a Staff Database Reliability Engineer at Quizlet, you will execute large scale data migrations, work with product teams to plan for data needs, participate in data governance initiatives, provide operational support for databases, and drive decomposition of databases for a microservice architecture shift.
22 Days Ago
San Francisco, CA
300 Employees
180K-225K Annually
7+ Years of Experience
300 Employees
180K-225K Annually
7+ Years of Experience
Cloud • Greentech • Other • Energy
Build fast, highly available infrastructure at scale. Contribute to architecture and design of new and current systems. Solid understanding of infrastructure design. Write high quality code and use modern infrastructure tools. Experience with logging, monitoring, and security.
Top Benefits:
401-K
401-K Matching
Commuter Benefits
+34 More
3 Days Ago
San Francisco, CA
165,000 Employees
3-5 Years of Experience
165,000 Employees
3-5 Years of Experience
Hardware • Retail • Software • Wearables
Site Reliability Engineer at Apple working on building and running distributed storage systems to support critical services. Involves solving unique challenges using deep understanding of storage, data analysis, programming, and teamwork.
2 Days Ago
San Francisco, CA
165,000 Employees
7+ Years of Experience
165,000 Employees
7+ Years of Experience
Hardware • Retail • Software • Wearables
Apple Services Engineering is seeking a Senior Site Reliability Engineer experienced in software and systems to join the Storage SRE team. Responsibilities include architectural and technical leadership for operating large scale distributed storage systems, driving best practices in resiliency, and designing and developing code in Go, Rust, Java, and Python.
All Filters
Date Posted
Job Category
Experience
Industry
Company Name
Company Size