Top Reliability Engineer Jobs in San Francisco, CA
Seeking a Senior Site Reliability Engineer with expertise in designing and operating large-scale distributed systems in the cloud, with a focus on FedRAMP-compliant infrastructure. Responsibilities include collaborating with software engineers, designing and managing infrastructure, ensuring compliance with FedRAMP controls, driving automation, and maintaining cloud-native services on AWS.
Join RingCentral as a Senior Site Reliability Engineer responsible for maintaining and operating monitoring systems and infrastructure. Collaborate with teams, ensure system reliability, and participate in incident resolution. Must-have skills include Linux, cloud platforms, and configuration management tools.
As a Site Reliability Engineer, you will design, scale, and manage AWS-backed infrastructure, automate system provisioning, ensure high availability, and improve monitoring capabilities. Requires 4+ years of SRE/DevOps experience with expertise in AWS services, container orchestration, and scripting languages. Compensation range: $114,000—$175,000 USD.
As a Senior Site Reliability Engineer at Atlassian, you will be responsible for improving the performance and reliability of services, addressing root causes of incidents, and automating repetitive tasks. You will collaborate with the team to develop innovative solutions and ensure high code quality, operating at scale in Amazon Web Services. Strong skills in Bash, Python, Linux, AWS, Ansible, Docker, Kubernetes, and ITIL are required.
Join Verkada as a Site Reliability Engineer - Infrastructure and be responsible for managing and scaling the infrastructure. You will work on optimizing cost efficiency, enforcing security requirements, improving monitoring and alerting, and adopting new technologies. This role requires a solid understanding of scripting languages, experience with cloud platforms like AWS, Kubernetes, and Terraform.
As a Principal Site Reliability Engineer, you will focus on innovating and providing strong technical vision for our platform's mission-critical datastores. You will build reliable, scalable, and highly available datastores on a multi-region scale platform. You will collaborate with leaders across the company as a subject matter expert and be a role model for the engineering team.
As a Site Reliability Engineer at Orb, you will play a critical role in maintaining and scaling our robust infrastructure, ensuring stability, scalability, and performance. You will be at the heart of tackling some of the most significant engineering challenges, from scaling our data ingestion pipelines to refining our observability and reliability practices.
As a Senior Site Reliability Engineer at Cisco Meraki, you will be responsible for building highly scalable cloud infrastructure and supporting critical customer infrastructure. Your role will involve automating processes, deploying new technologies, and improving infrastructure efficiency. Collaboration with vendors, data center operations, and cross-functional teams is essential. This role requires 24/7 on-call support and technical project delivery. Ideal candidates will have experience in leading large technical projects and a strong background in Linux, automation, and infrastructure management.
Featured Jobs
Design and develop scalable and stable test systems and frameworks for a large-scale Hardware in the Loop (HIL) test platform at Cruise. Influence the technical roadmap, develop software platforms, tools for monitoring and reporting, and automate routine tasks. Mentor junior engineers and drive reliability and scalability improvements through automation.
Guide the technical design, implementation, and optimization of global infrastructure services primarily focused on Hybrid Cloud. Research and recommend new technology solutions, ensure reliability and redundancy, manage projects, automate operational activities, integrate systems technologies with ServiceNow platform, and ensure legal compliance.
The Staff SRE - Technical Duty Officer at ServiceNow supports and protects all of ServiceNow's public services, providing technical leadership for a team of on-site engineers responsible for the availability and performance of ServiceNow's cloud platform. This role involves coordinating recovery efforts and crisis management during major outages.
RingCentral is seeking a Senior Site Reliability Engineer to work on infrastructure solutions, Docker infrastructure, automation, and deployment activities. Responsibilities include production support, research, development, IaaC with Terraform, CI/CD processes, and collaboration with teams.
As a Sr. Database Reliability Engineer at Quizlet, you will be responsible for planning, managing, and scaling the data layer to be resilient and performant. You will collaborate with engineering teams on architectural decisions, advise on data modeling, and improve database reliability awareness. This hybrid role involves monitoring, supporting, and optimizing both RDBMS and streaming technologies.
Looking for a reliability expert to join our growing SRE teams. Must have deep understanding of modern Cloud Infrastructure and operational best practices. Responsible for driving change across services and processes to improve reliability, performance, scalability, and cost efficiency. Proficiency in Java, Go, or Python is required. Remote-friendly opportunity.
Design and implement production-grade systems, establish standards for automation, plan complex migrations, improve on-call experience, and lead technical roadmaps for system reliability and scalability.
The Operations Reliability Engineer at Apple is responsible for guiding development and operations teams towards generating reliable designs for new technology components and products. They lead operational readiness activities, analyze test correlation, summarize reliability results, and drive improvements in reliability testing and product ramp.
Develop and lead the reliability program for new products at Span, including creating test plans, conducting testing, analyzing data, and managing a reliability team. Build and manage the in-house reliability lab and collaborate with internal and external teams for product testing. Requires 8+ years of experience in reliability testing and a BS or MS in Mechanical or Electrical Engineering.
The Apple Service Engineering - Redis SRE team is looking for Site Reliability Engineers with experience in developing processes, tools, and automation for managing distributed systems in production environments.
Lead introduction and qualification of new products with high quality in the Silicon Technologies group at Apple. Drive product, process, and package qualification and reliability stress tests from design to completion. Must have strong communication skills and experience in semiconductor engineering and reliability standards.
The Software Engineer in Reliability Engineering role at Grammarly involves building world-class, secure, and reliable cloud-native infrastructure solutions for Grammarly engineers. Responsibilities include improving incident management, introducing auto-scaling and resilience mechanisms, conducting chaos testing, and establishing best practices for reliability.
As a SoC Silicon Reliability Engineer, you will drive SoC product, process, and package qualification and reliability tests. You will be responsible for failure analysis during product or package qualification, applying quality standards, and implementing quality improvement processes with suppliers.
As a Staff Database Reliability Engineer at Quizlet, you will execute large scale data migrations, work with product teams to plan for data needs, participate in data governance initiatives, provide operational support for databases, and drive decomposition of databases for a microservice architecture shift.
Build fast, highly available infrastructure at scale. Contribute to architecture and design of new and current systems. Solid understanding of infrastructure design. Write high quality code and use modern infrastructure tools. Experience with logging, monitoring, and security.
Site Reliability Engineer at Apple working on building and running distributed storage systems to support critical services. Involves solving unique challenges using deep understanding of storage, data analysis, programming, and teamwork.
Apple Services Engineering is seeking a Senior Site Reliability Engineer experienced in software and systems to join the Storage SRE team. Responsibilities include architectural and technical leadership for operating large scale distributed storage systems, driving best practices in resiliency, and designing and developing code in Go, Rust, Java, and Python.
All Filters
No Results
No Results