Top Reliability Engineer Jobs in San Francisco
An opportunity at Cloudflare for a Core Systems Reliability Engineer to build software automating large distributed systems and ensure platform security. Requires experience in Go or Python, Docker deployment, networking, and debugging skills.
As a Core Systems Reliability Engineer at Cloudflare, you will build software for automating the operation of large distributed systems, ensure platform security, document work, contribute to the open-source community, and improve code quality. Required skills include Go or Python, experience in software engineering, Docker deployment, networking, debugging, and automation.
Staff Reliability Engineer responsible for integrating security practices into workflow, ensuring continuous delivery of secure software/hardware solutions. Collaborate with development, operations, and security teams to identify vulnerabilities, implement security measures, and automate processes. Key role in fostering a secure DevOps culture, improving security posture, and ensuring compliance with regulations.
Apple Vision Pro is seeking a Reliability Engineer to ensure and improve the durability and reliability of Apple's products. Responsibilities include developing new reliability tests, analyzing data, researching new technologies, and guiding design to improve product reliability. Ideal candidate is passionate about customer experience, has strong communication skills, attention to detail, and can handle multiple projects.
As a Reliability Engineer at Mill, you will be responsible for ensuring that Mill’s hardware products meet a very high quality bar for longterm device performance. You will work closely with the Design Engineers and Operations team to drive Reliability testing throughout the supply chain and build internal lab capabilities for critical development test items.
Site Reliability Engineer role at Citadel Securities with responsibilities including system troubleshooting, capacity planning, and application deployment. Requires 3+ years of experience, knowledge of network systems, scripting languages, and trading desk support. Experience in Python development and operating system performance tuning preferred. Must have good collaboration and communication skills.
Seeking a Staff Site Reliability Engineer to improve security and reliability of critical cloud-based infrastructure for early cancer detection technology. Responsibilities include ensuring high availability, incident management, automation, performance optimization, security compliance, monitoring/alerting, and software development consultation.
Seeking a Site Reliability Engineer to design and implement SRE practices and ensure availability and scalability of production systems
Featured Jobs
Operations Reliability Engineer role at Apple focusing on ensuring the reliability of new technology components and products through testing and failure analysis. Responsibilities include guiding development teams, leading FMEA sessions, analyzing test correlation, summarizing reliability results, and developing capable suppliers. Requires 7+ years of experience and strong skills in reliability testing, failure analysis, statistical process control, and data analytics.
As a Senior Site Reliability Engineer at Cisco Meraki, you will be responsible for building highly scalable cloud infrastructure and supporting critical customer infrastructure. Your role will involve automating processes, deploying new technologies, and improving infrastructure efficiency. Collaboration with vendors, data center operations, and cross-functional teams is essential. This role requires 24/7 on-call support and technical project delivery. Ideal candidates will have experience in leading large technical projects and a strong background in Linux, automation, and infrastructure management.
Seeking a Reliability Engineer with a background in complex mechanical systems related to electric vehicle powertrains. Responsibilities include test planning, risk assessment, root cause analysis, and driving reliable design choices. Requires 2+ years of experience in powertrain technologies and proficiency in reliability methods and testing.
RingCentral is seeking a Senior Site Reliability Engineer to work on infrastructure solutions, Docker infrastructure, automation, and deployment activities. Responsibilities include production support, research, development, IaaC with Terraform, CI/CD processes, and collaboration with teams.
Lead Site Reliability engineering effort to improve anomaly detection, platform stability and resilience using modern best practice.
Looking for a reliability expert to join our growing SRE teams. Must have deep understanding of modern Cloud Infrastructure and operational best practices. Responsible for driving change across services and processes to improve reliability, performance, scalability, and cost efficiency. Proficiency in Java, Go, or Python is required. Remote-friendly opportunity.
The Site Reliability Engineer at ServiceNow is responsible for maintaining and developing the reliability, scalability, and performance of the infrastructure. The role involves a combination of software development, networking, and systems engineering expertise to improve services for customers.
As a Site Reliability Engineer at ServiceNow, you will maintain and develop the reliability, scalability, and performance of the infrastructure. Responsibilities include driving technical resolutions, software development, and reducing incidents. Required skills include DevOps, Automation, Linux, software development, and Cloud technologies.
Design and implement production grade systems, establish standards and automation, plan and execute migrations, improve on call experience, and collaborate effectively within a team. Requires 2+ years of experience with scripting languages, systems thinking approach, problem-solving skills, cloud platform experience, and incident remediation.
Seeking a Senior Engineer for artifact management system design and maintenance, enforcing best practices, architecting cloud agnostic solutions, and continuous improvement of developer experience in a global hybrid environment. Must have expertise in CI/CD, artifact management, IaC provisioning, and source code management services, as well as experience with Kubernetes at scale.
We are looking for a Principal Site Reliability Engineer with expertise in scaling Cloud services. The candidate should have deep understanding of modern Cloud infrastructure, programming expertise, and operational experience. They will be responsible for improving services and processes to enhance reliability, performance, scalability, and cost efficiency. The engineer will work with teams across the organization to advocate for reliability methodologies and will report to the Senior Engineering Manager.
Seeking a Senior Site Reliability Engineer with expertise in designing and operating large-scale distributed systems in the cloud, with a focus on FedRAMP-compliant infrastructure. Responsibilities include collaborating with software engineers, designing and managing infrastructure, ensuring compliance with FedRAMP controls, driving automation, and maintaining cloud-native services on AWS.
The Reliability Engineer at Apple will work on ensuring products meet customer expectations for robustness and reliability by conducting reliability tests, identifying failure modes, and collaborating with design teams. Responsibilities include developing reliability tests, failure analysis, and driving program decisions based on test results.
The Site Reliability Engineer will be responsible for providing 24x7 production support for Government Community Cloud infrastructure, driving technical resolutions across the technology stack, and improving platform operability and incident response.
Design reliability into low voltage electronic systems of Tesla's Dojo supercomputer, collaborate with cross-functional engineering teams, lead design FMEA sessions, analyze field usage, facilitate failure analysis, provide reliability design guidelines, and apply reliability lessons learned for continuous improvement.
As a Senior Site Reliability Engineer at Atlassian, you will be responsible for improving the performance and reliability of services, addressing root causes of incidents, and automating repetitive tasks. You will collaborate with the team to develop innovative solutions and ensure high code quality, operating at scale in Amazon Web Services. Strong skills in Bash, Python, Linux, AWS, Ansible, Docker, Kubernetes, and ITIL are required.
The Cloud Site Reliability Engineer will improve and control hybrid-cloud usage, implement tooling and controls, and automate cloud operations. They will also ensure compliance with security standards and maintain high availability of the infrastructure.
Top San Francisco Companies Hiring Reliability Engineers
See AllAll Filters
No Results
No Results