Luminary Cloud

SRE

Reposted 23 Days Ago

Be an Early Applicant

In-Office

San Mateo, CA, USA

Senior level

In-Office

San Mateo, CA, USA

Senior level

Design, implement, and maintain scalable backend systems and APIs; build cloud infrastructure (preferably GCP) using Terraform; operate containerized workloads with Kubernetes; ensure reliability, security, and performance; participate in on-call rotations, architecture discussions, and cross-functional delivery.

The summary above was generated by AI

Luminary helps engineering companies be more competitive by getting to market faster, creating new, better products, and reducing development risk. We do this with our Physics AI platform, the fastest and easiest way to build and deploy models to understand and instantly predict physical reality with precision. Customers span industries from automotive and aerospace, to leading sporting equipment providers, including Otto Aviation, Joby Aviation, Piper Aircraft and Trek Bikes. Luminary is a Series B company and is headquartered in San Mateo, California.

About Luminary

Luminary helps engineering companies be more competitive by getting to market faster, creating new, better products, and reducing development risk. We do this with our Physics AI platform, the fastest and easiest way to build and deploy models to understand and instantly predict physical reality with precision. Customers span industries from automotive and aerospace to leading sporting equipment providers, including Otto Aviation, Joby Aviation, Piper Aircraft, and Trek Bikes. Luminary is a Series B company and is headquartered in San Mateo, California.

Role Description

The Luminary Physics AI platform is a SaaS offering that runs on GCP. It uses GPUs for data generation, model training, and mode inference and supports accelerated engineering design workflows. The product generates and consumes large volumes of data for Physics AI models and is used by some of the most demanding customers in automotive, aerospace and defense industries. An elevated security and compliance posture, the ability to maintain five-nine SLAs, use automation for most tasks and managing large data volumes make this an exciting opportunity for a production Site Reliability Engineer.

The right candidate will apply software engineering principles to operations, focusing on system reliability, performance, and scalability. You will collaborate closely with engineering and product teams to design, deliver, and scale the core systems that power our platform. You will be responsible for suggesting product changes that allow us to manage 10k users simultaneously on the platform with effective resource management

Key Duties, Responsibilities, and DeliverablesReliability & Operational Excellence

Participate in on-call rotations and incident response, implementing effective remediation strategies and leading post-incident reviews to prevent recurrence
Define, monitor, and enforce Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to meet internal and external reliability targets
Apply software engineering practices to eliminate toil by automating operational tasks, improving overall efficiency, and contributing to the operational reliability of the platform

Infrastructure & Platform Engineering

Develop and enhance our cloud infrastructure (GCP preferred) through automation and Infrastructure as Code (Terraform)
Develop, oversee, and maintain operational systems (from deployment pipelines to orchestration layers) ensuring application health, reliability, and scalability using containerized solutions like Kubernetes
Execute scalability and performance optimization strategies to ensure systems efficiently handle increasing workloads and future growth

Architecture & Systems Design

Contribute to the design and implementation of highly-available and fault-tolerant systems, leveraging Service-Oriented Architecture (SOA) or microservices principles
Participate in architectural discussions that influence the platform’s long-term reliability, performance, and scalability

Security & Documentation

Collaborate with security experts to integrate IAM, authentication, authorization, encryption, and related best practices into the infrastructure
Create and maintain comprehensive documentation on system architecture, infrastructure, and security practices

Expertise and QualificationsRequired

Proven experience designing and implementing scalable SaaS backend systems
Strong understanding of cloud infrastructure (GCP preferred), CI/CD pipelines, and core SRE/DevOps concepts.
5+ years of experience building performant, scalable, distributed systems (or equivalent experience).
10+ years of experience required for Senior/Lead candidates.
Proficiency in Golang and Python is highly desirable.
Familiarity with Kubernetes and container orchestration.
Experience with Infrastructure as Code (Terraform) and cloud automation.
Strong understanding of operational practices and willingness to participate in on-call rotations.
Knowledge of modern security principles and IAM fundamentals.

Preferred Qualifications (Senior Candidates)

Demonstrated success scaling infrastructure in a startup environment, including multicloud, hybrid, or on-prem deployments.
Proven experience mentoring and guiding engineers, supporting technical growth and career development.
Ability to act as a technical architect, making high-impact design decisions for reliable, scalable, and secure platform systems.

Top Skills

Ci/Cd

Cloud Automation

Container Orchestration

Google Cloud Platform

Iam

Infrastructure As Code

Kubernetes

Microservices

Python

Service-Oriented Architecture

Terraform

500 Arguello St, Suite 105, Redwood, California, United States, 94063

Similar Jobs

Applied Systems

Site Reliability Engineer

16 Hours Ago

Remote or Hybrid

65K-135K Annually

Mid level

65K-135K Annually

Mid level

Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics

The Site Reliability Engineer will ensure system reliability and scalability, manage infrastructure, automate tasks, and collaborate cross-functionally while mentoring junior engineers and supporting production environments.

Top Skills: AnsibleArgocdBashDatadogGithub ActionsGitlabGoHashicorp ConsulHelmKubernetesPackerPostgresPowershellPythonSQL ServerTerraformTypescript

MongoDB

Site Reliability Engineer

4 Days Ago

Easy Apply

Remote or Hybrid

United States

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

As a Senior Site Reliability Engineer, you'll design and build complex systems, support Atlas platform operations, automate processes, and ensure high availability of services.

Top Skills: AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls

BAE Systems, Inc.

Site Reliability Engineer

6 Days Ago

Hybrid

79K-135K Annually

Mid level

79K-135K Annually

Mid level

Aerospace • Hardware • Information Technology • Security • Software • Cybersecurity • Defense

The Site Reliability Engineer will deploy and monitor various solutions, implement automation, support infrastructure, and work collaboratively across teams to ensure service excellence and continuous improvement.

Top Skills: AnsibleAzure StackCephHelm ChartsIaasJdfsKubernetesNfsObject StorageOpen StackPaasSaaSVMware

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Luminary Cloud

SRE

Top Skills

Luminary Cloud Redwood, California, USA Office

Similar Jobs

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech