GRAIL Jobs

Staff Site Reliability Engineer (SRE) | Dev Ops Engineer #4770

GRAIL

Staff Site Reliability Engineer (SRE) | Dev Ops Engineer #4770

Reposted 12 Days Ago

Be an Early Applicant

Hybrid

Menlo Park, CA, USA

169K-224K Annually

Senior level

Hybrid

Menlo Park, CA, USA

169K-224K Annually

Senior level

Lead the design and operation of a fault-tolerant cloud infrastructure, implement infrastructure-as-code, manage Kubernetes reliability, and mentor engineers.

The summary above was generated by AI

Our mission is to detect cancer early, when it can be cured. We are working to change the trajectory of cancer mortality and bring stakeholders together to adopt innovative, safe, and effective technologies that can transform cancer care.

We are a healthcare company, pioneering new technologies to advance early cancer detection. We have built a multi-disciplinary organization of scientists, engineers, and physicians and we are using the power of next-generation sequencing (NGS), population-scale clinical studies, and state-of-the-art computer science and data science to overcome one of medicine’s greatest challenges.

GRAIL is headquartered in the bay area of California, with locations in Washington, D.C., North Carolina, and the United Kingdom. It is supported by leading global investors and pharmaceutical, technology, and healthcare companies.

For more information, please visit grail.com

GRAIL is seeking a Staff Site Reliability / DevOps Engineer to lead the reliability, scalability, and security of our cloud-native platform. This role operates at the intersection of infrastructure engineering, platform strategy, and organizational leadership, supporting systems that power large-scale data processing and cutting-edge cancer detection technologies.

You will define and drive infrastructure standards across teams, represent reliability and performance in architecture decisions, and build systems that scale well beyond your direct ownership. This is a highly technical, high-impact role combining hands-on engineering with cross-functional influence and mentorship.

Onsite Expectations You will work on-site full-time at our office located in Menlo Park, California. Beginning in Fall 2026, you will work at our new headquarters in Sunnyvale, California.

Reponsibilities

Design, build, and operate highly available, fault-tolerant cloud infrastructure across AWS, GCP, and/or Azure
Architect and maintain scalable CI/CD pipelines and deployment frameworks for enterprise-grade software delivery
Lead infrastructure-as-code adoption and maturity using tools such as Terraform, CloudFormation, and Ansible
Own Kubernetes reliability across multi-cluster environments, including upgrades, scaling, and workload lifecycle management
Establish and evolve observability platforms (metrics, logs, traces) and define SLO/SLI frameworks across teams
Lead incident response for critical outages, drive root cause analysis, and implement preventative improvements
Optimize infrastructure for cost, performance, and scalability, partnering closely with engineering and finance stakeholders
Define and enforce DevOps, reliability, and security best practices across the organization
Partner cross-functionally with engineering, data, QA, security, and IT teams to design resilient systems
Mentor engineers and contribute to technical leadership through design reviews, standards, and knowledge sharing

These responsibilities summarize the role’s primary responsibilities and are not an exhaustive list. They may change at the company’s discretion.

What Success Looks Like in Your First Year

Conduct a comprehensive assessment of the current infrastructure, drive infrastructure-as-code adoption to 95%+ across critical systems, and establish clear health and reliability baselines for the Kubernetes platform
Standardize observability using modern tooling and implement an SLO/SLI framework adopted across multiple product teams, including defined SLAs for critical data systems
Strengthen security and compliance posture across cloud environments by implementing consistent baselines, launching a compliance-as-code framework, and reducing mean time to resolution (MTTR) for production incidents
Define, document, and drive adoption of engineering standards, best practices, and operational guidelines across platform and product teams
Develop and align stakeholders on a forward-looking platform reliability and infrastructure roadmap
Demonstrate measurable mentorship and technical leadership impact across the engineering organization
Evaluate and provide recommendations on emerging infrastructure needs, including support for AI/ML and advanced data workloads

Required Qualifications

BS in Computer Science, Engineering, or related field, or equivalent experience
8+ years of experience in Site Reliability Engineering, DevOps, or platform engineering
Strong hands-on experience with at least one major cloud platform (AWS, GCP, or Azure)
Experience implementing infrastructure-as-code solutions (Terraform, CloudFormation, or similar)
Experience designing and operating CI/CD pipelines (e.g., GitLab CI, GitHub Actions, Jenkins)
Hands-on experience with Kubernetes and containerized systems in production environments
Proficiency in scripting or programming for automation (e.g., Python, Go, Bash, or PowerShell)
Experience with observability and monitoring tools (e.g., Prometheus, Grafana, OpenTelemetry, Datadog)
Strong understanding of networking, security, and distributed systems fundamentals
Experience working in regulated environments and familiarity with frameworks such as ISO 27001, NIST, SOC 2, or HIPAA

Preferred Qualifications

10+ years of experience in SRE, DevOps, or infrastructure engineering
Experience operating multi-cluster Kubernetes environments (e.g., EKS, GKE) at scale
Familiarity with GitOps practices (e.g., ArgoCD, Flux)
Experience with data platforms and pipelines (e.g., Kafka, Airflow, Spark, Snowflake, BigQuery)
Experience implementing SLO/SLI frameworks and reliability practices across multiple teams
Strong background in cloud security, including IAM, zero-trust architecture, and secrets management
Experience with compliance-as-code and security tooling (e.g., OPA, Snyk, Checkov)
Exposure to AI/ML or large-scale data infrastructure workloads
Experience in healthcare, biotech, or other regulated industries
Relevant cloud or Kubernetes certifications (e.g., AWS DevOps, CKA/CKS, GCP DevOps)

Physical Demands and Working Environment

Standard office environment with hybrid flexibility

Participation in on-call rotation and after-hours support for critical systems may be required

Frequent collaboration with cross-functional and senior stakeholders

Fast-paced, dynamic environment with emphasis on reliability, scalability, and innovation

Adaptability and Growth Expectation

As the organization evolves, responsibilities may expand or shift to meet business needs. This may include:

Taking on additional technical or leadership responsibilities
Participating in cross-functional initiatives and strategic projects
Adapting to new technologies, tools, and methodologies
Supporting other teams during periods of high demand

The expected, full-time, annual base pay scale for this position is $169K - $224K. Actual base pay will consider skills, experience, and location.

This role may be eligible for other forms of compensation, including an annual bonus and/or incentives, subject to the terms of the applicable plans and Company discretion. This range reflects a good-faith estimate of the range that the Company reasonably expects to pay for the position upon hire; the actual compensation offered may vary depending on factors such as the candidate’s qualifications. Employees in this role are also eligible for GRAIL’s comprehensive and competitive benefits package, offered in accordance with our applicable plans and policies. This package currently includes flexible time-off or vacation; a 401(k) retirement plan with employer match; medical, dental, and vision coverage; and carefully selected mindfulness programs.

GRAIL is an equal employment opportunity employer, and we are committed to building a workplace where every individual can thrive, contribute, and grow. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, sex, gender, gender identity, sexual orientation, age, disability, status as a protected veteran, , or any other class or characteristic protected by applicable federal, state, and local laws. Additionally, GRAIL will consider for employment qualified applicants with arrest and conviction records in a manner consistent with applicable law and provide reasonable accommodations to qualified individuals with disabilities. Please contact us at [email protected] if you require an accommodation to apply for an open position.

GRAIL maintains a drug-free workplace. We welcome job-seekers from all backgrounds to join us!

GRAIL is headquartered in Menlo Park, California, with locations in Washington, D.C., North Carolina, and the United Kingdom. We also have a number of employees who are working remotely. Our bay area office has a employees working in our labs, software engineering, clinical development and more.

Similar Jobs at GRAIL

GRAIL

Account Director

3 Days Ago

Remote or Hybrid

168K-231K Annually

Senior level

168K-231K Annually

Senior level

Artificial Intelligence • Big Data • Healthtech • Machine Learning • Software • Biotech

The Payer Engagement Account Director will drive payer opportunities, build relationships with payer accounts, lead meetings, and translate interest into agreements. Requires navigating complex organizations, maintaining pipeline management, and collaborating across functions to support evidence and access requirements.

Top Skills: Data ScienceNext-Generation Sequencing (Ngs)

GRAIL

Senior Director, Payer Engagement (Market Access) #4605

3 Days Ago

Hybrid

Menlo Park, CA, USA

181K-259K Annually

Senior level

181K-259K Annually

Senior level

Artificial Intelligence • Big Data • Healthtech • Machine Learning • Software • Biotech

The Senior Director, Payer Engagement will develop payer strategies, manage relationships with payers, lead a team, and ensure compliance and access to GRAIL's products. This role involves negotiating contracts and collaborating with various teams within the organization.

GRAIL

Senior Accountant

5 Days Ago

Hybrid

Menlo Park, CA, USA

109K-144K Annually

Senior level

109K-144K Annually

Senior level

Artificial Intelligence • Big Data • Healthtech • Machine Learning • Software • Biotech

Lead month-end, quarter-end, and year-end close activities including journal entries, reconciliations, variance analysis, and accruals (including clinical trial accruals). Support SOX compliance and audits, drive automation and AI-enabled process improvements, maintain controls, and partner cross-functionally to ensure accurate US GAAP financial reporting.

Top Skills: Ai ToolsCoupaExcelGoogle DocsGoogle SheetsNetSuiteScripting

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

GRAIL

Staff Site Reliability Engineer (SRE) | Dev Ops Engineer #4770

GRAIL Menlo Park, California, USA Office

Similar Jobs at GRAIL

Account Director

Senior Director, Payer Engagement (Market Access) #4605

Senior Accountant

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech