Top Reliability Engineer Jobs in San Francisco, CA
CrowdStrike is seeking a Senior Systems Engineer to join the Data Services team, responsible for maintaining and automating data components like Cassandra, ElasticSearch, Kafka, and more. The role involves working on large-scale cloud-based systems and ensuring the security and availability of critical business data.
Build software for automating operation of distributed systems, ensure security, document work, contribute to open source, improve code quality
Join CrowdStrike as a Data Reliability Engineer III working on building and developing data solutions to ensure accuracy and consistency across diverse sources. Collaborate on designing and implementing data pipelines, automating quality checks, and mentoring junior team members. Play a key role in maintaining data integrity and efficiency in a large-scale production environment.
Looking for a reliability expert to join the SRE teams at Atlassian, with expertise in scaling Cloud services and driving reliability methodologies. Responsibilities include improving services for higher reliability, performance, scalability, and cost efficiency. This role reports to a Senior Engineering Manager.
Webflow is seeking a Senior Site Reliability Engineer to improve the reliability and stability of its customer-facing infrastructure serving millions of page views. Responsibilities include empowering engineers on other teams, enhancing application reliability in Kubernetes, collaborating with customer support teams, and participating in on-call and incident response processes.
As a Site Reliability Engineer at Alchemy, you will be responsible for setting high standards for reliability, developing and owning reliability best practices, architecting production infrastructure, collaborating with engineering teams, and continuously improving infrastructure and systems. The role requires at least 6 years of experience in infrastructure engineering focused on reliability and expertise in various tools and practices related to observability, cloud infrastructure, containerization, and CI/CD.
As a Senior Site Reliability Engineer at Cisco Meraki, you will be responsible for building highly scalable cloud infrastructure and supporting critical customer infrastructure. Your role will involve automating processes, deploying new technologies, and improving infrastructure efficiency. Collaboration with vendors, data center operations, and cross-functional teams is essential. This role requires 24/7 on-call support and technical project delivery. Ideal candidates will have experience in leading large technical projects and a strong background in Linux, automation, and infrastructure management.
Seeking a Lead Site Reliability Engineer to build a reliable web experience for users. Responsibilities include implementing SRE practices, automation, incident management, observability, and leading cross-functional projects. Requires 10+ years as a software engineer and 5+ years as an SRE.
Featured Jobs
As a Senior Site Reliability Engineer at Atlassian, you will be responsible for improving the performance and reliability of services, addressing root causes of incidents, and automating repetitive tasks. You will collaborate with the team to develop innovative solutions and ensure high code quality, operating at scale in Amazon Web Services. Strong skills in Bash, Python, Linux, AWS, Ansible, Docker, Kubernetes, and ITIL are required.
Seeking a Senior Site Reliability Engineer with expertise in designing and operating large-scale distributed systems in the cloud, with a focus on FedRAMP-compliant infrastructure. Responsibilities include collaborating with software engineers, designing and managing infrastructure, ensuring compliance with FedRAMP controls, driving automation, and maintaining cloud-native services on AWS.
Upgrade, a fintech company, is seeking a Senior Site Reliability Engineer to enhance the reliability, performance, and scalability of systems in a completely remote role. Responsibilities include building a resilient platform, automating deployment and incident response, monitoring and troubleshooting, and collaborating with various stakeholders. Requires 5+ years of SRE/DevOps experience in a cloud environment, proficiency in AWS services and scripting languages, and knowledge of system troubleshooting.
Looking for an experienced Site Reliability Engineering Manager to lead the Compute infrastructure team at Roblox. Responsibilities include driving reliability, championing production health, and building/implementing systems to improve scalability and efficiency. Must have team leadership skills, system expertise, and over 3 years of engineering management experience. Bachelor's degree in Computer Science or related field required.
Join Verkada as a Site Reliability Engineer - Infrastructure and be responsible for managing and scaling the infrastructure. You will work on optimizing cost efficiency, enforcing security requirements, improving monitoring and alerting, and adopting new technologies. This role requires a solid understanding of scripting languages, experience with cloud platforms like AWS, Kubernetes, and Terraform.
As a Principal Site Reliability Engineer, you will focus on innovating and providing strong technical vision for our platform's mission-critical datastores. You will build reliable, scalable, and highly available datastores on a multi-region scale platform. You will collaborate with leaders across the company as a subject matter expert and be a role model for the engineering team.
The Staff Site Reliability Engineer for Data Engineering at Crunchyroll is responsible for maintaining and enhancing the reliability of data infrastructure, collaborating with data and software engineers, driving automation, and best practices for monitoring and alerting. This role requires extensive experience in site reliability engineering and database operations with a focus on data platforms, AWS cloud platform knowledge, proficiency in monitoring tools and automation frameworks.
Join the Platform Engineering team at Huntress and work on building, monitoring, and implementing infrastructure for the Huntress Security Platform. Responsibilities include automation of operational tasks, supporting observability capabilities, and enhancing software architecture for scalability and reliability. Required skills include AWS architecture, Linux-based infrastructure, coding proficiency in Ruby, Go, and Python, automation using Terraform, and experience with observability tools like DataDog and NewRelic.
Develop and lead the reliability program for new products at Span, including creating test plans, conducting testing, analyzing data, and managing a reliability team. Build and manage the in-house reliability lab and collaborate with internal and external teams for product testing. Requires 8+ years of experience in reliability testing and a BS or MS in Mechanical or Electrical Engineering.
Design and implement production grade systems, establish standards, plan and execute migrations, improve on call experience, lead technical roadmaps
Build fast, highly available infrastructure at scale. Contribute to architecture and design of new and current systems. Solid understanding of infrastructure design. Write high quality code and use modern infrastructure tools. Experience with logging, monitoring, and security.
Babylist is looking for a Staff Software Engineer, Site Reliability to ensure the stability, scalability, and reliability of systems. The role involves supporting shared infrastructure and driving continuous improvement through expertise in site reliability engineering, AWS cloud infrastructure, and DevOps practices.
As a Senior Site Reliability Engineer at Boomi, you will be responsible for developing sophisticated systems and software based on customer's business goals. You will actively participate in incident detection and resolution, engage in on-call rotation, and collaborate with various engineering teams to enhance Boomi offerings. Improving scalability and reliability through automation and implementing best practices will be key aspects of this role.
Looking for a reliability expert to join our growing SRE teams. Must have deep understanding of modern Cloud Infrastructure and operational best practices. Responsible for driving change across services and processes to improve reliability, performance, scalability, and cost efficiency. Proficiency in Java, Go, or Python is required. Remote-friendly opportunity.
As a Site Reliability Engineer at Boomi, you will design, build, and maintain infrastructure as code, participate in detecting and remediating production incidents, collaborate with engineering teams, and implement best practices for observability and monitoring.
As a Senior Site Reliability Engineer, you will be responsible for ensuring the availability, reliability, and scalability of the company's infrastructure and applications. This role involves designing and implementing highly automated systems, collaborating with software engineering teams, and maintaining system health and performance. You will also be involved in disaster recovery planning, incident response, and on-call rotations to ensure 24/7 application availability.
As a Sr. Database Reliability Engineer at Quizlet, you will be responsible for planning, managing, and scaling the data layer to be resilient and performant. You will collaborate with engineering teams on architectural decisions, advise on data modeling, and improve database reliability awareness. This hybrid role involves monitoring, supporting, and optimizing both RDBMS and streaming technologies.
Top San Francisco Companies Hiring Reliability Engineers
See AllAll Filters
No Results
No Results