PsiQuantum Logo

PsiQuantum

Site Reliability Engineer

Reposted 12 Days Ago
In-Office
Palo Alto, CA
120K-140K Annually
Senior level
In-Office
Palo Alto, CA
120K-140K Annually
Senior level
As an SRE, you'll maintain service reliability, operate monitoring tools, automate tasks in Python, and manage incident responses.
The summary above was generated by AI

PsiQuantum’s mission is to build the first useful quantum computers—machines capable of delivering the breakthroughs the field has long promised. Since our founding in 2016, our singular focus has been to build and deploy million-qubit, fault-tolerant quantum systems. 

Quantum computers harness the laws of quantum mechanics to solve problems that even the most advanced supercomputers or AI systems will never reach. Their impact will span energy, pharmaceuticals, finance, agriculture, transportation, materials, and other foundational industries. 

Our architecture and approach is based on silicon photonics. By leveraging the advanced semiconductor manufacturing industry—including partners like GlobalFoundries—we use the same high-volume processes that already produce billions of chips for telecom and consumer electronics. Photonics offers natural advantages for scale: photons don’t feel heat, are immune to electromagnetic interference, and integrate with existing cryogenic cooling and standard fiber-optic infrastructure. 

In 2024, PsiQuantum announced government-funded projects to support the build-out of our first utility-scale quantum computers in Brisbane, Australia, and Chicago, Illinois. These initiatives reflect a growing recognition that quantum computing will be strategically and economically defining—and that now is the time to scale. 

PsiQuantum also develops the algorithms and software needed to make these systems commercially valuable. Our application, software, and industry teams work directly with leading Fortune 500 companies—including Lockheed Martin, Mercedes-Benz, Boehringer Ingelheim, and Mitsubishi Chemical—to prepare quantum solutions for real-world impact. 

Quantum computing is not an extension of classical computing. It represents a fundamental shift—and a path to mastering challenges that cannot be solved any other way. The potential is enormous, and we have a clear path to make it real. 

Come join us. 

Job Summary: 

Join the OS/Platform team as a Site Reliability Engineer (SRE) and keep our services healthy, observable, and fast. Partnering with the Platform Engineering group, you’ll own the day‑to‑day operation of our monitoring stack—Grafana, Prometheus, Loki, and Tempo—crafting dashboards that surface golden signals and drive real‑time insight. You’ll codify reliability through SLIs/SLOs, automate runbooks in Python, and lead incident response to maintain world‑class uptime across both on‑prem and AWS environments. 

Responsibilities: 

  • Define, implement, and iterate on Service Level Indicators & Service Level Objectives (SLIs/SLOs) and error budgets for critical services, with a focus on network reliability and data centre interconnects. 
  • Build and maintain Grafana dashboards that visualize golden signals (latency, traffic, errors, saturation), extending coverage to network telemetry such as packet loss, jitter, bandwidth utilization, and BGP/EVPN stability. 
  • Operate and tune the observability pipeline (Prometheus, Loki, Tempo) to ensure scalable, low-latency telemetry ingestion and alerting for networking as well as compute layers. 
  • Drive incident response: triage, mitigate, perform post-incident reviews, and implement preventive actions—particularly for network-related outages, congestion, or misconfigurations. 
  • Develop automation and self-service tooling in Python/Bash to streamline alerts, runbooks, and operational tasks, including network monitoring and diagnostics. 
  • Collaborate with Platform, Product, and Networking teams on capacity planning, performance testing, traffic engineering, and change management. 
  • Improve CI/CD health checks and release safety nets within GitLab, with attention to network dependencies in deployments. 
  • Contribute to Infrastructure as Code (Terraform, Ansible) for monitoring stack deployments and upgrades, including network observability tooling and configuration 

Experience/Qualifications: 

  • Bachelor’s Degree or higher in Computer Science, Engineering, or related technical field. 
  • 5+ years in an SRE, DevOps, or Production Engineering role supporting distributed systems in production. 
  • Hands-on expertise with observability tools: Grafana, Prometheus, Loki, Tempo (or equivalent). 
  • Proven track record designing dashboards and alerts around golden signals and USE/RED methodologies, extended to network utilization, saturation, and error metrics. 
  • Solid scripting/automation skills in Python and Bash; familiarity with GitLab CI pipelines. 
  • Operational experience with Kubernetes and containerized workloads. 
  • Strong working knowledge of AWS services, data centre networking fundamentals, routing protocols, load balancing, and network overlays (e.g., VXLAN/EVPN). 
  • Experience running incident response and writing actionable post-mortems, including for network-related events. 
  • Familiarity with Infrastructure as Code (Terraform, Ansible) and configuration management. 
  • Exposure to regulated environments, multi-region networking architectures, and hybrid on-prem/cloud topologies is a plus. 
  • Strong communication and collaboration skills; comfortable acting as a generalist across infrastructure, networking, application, and data layers. 

 

PsiQuantum provides equal employment opportunity for all applicants and employees. PsiQuantum does not unlawfully discriminate on the basis of race, color, religion, sex (including pregnancy, childbirth, or related medical conditions), gender identity, gender expression, national origin, ancestry, citizenship, age, physical or mental disability, military or veteran status, marital status, domestic partner status, sexual orientation, genetic information, or any other basis protected by applicable laws.

Note: PsiQuantum will only reach out to you using an official PsiQuantum email address and will never ask you for bank account information as part of the interview process. Please report any suspicious activity to [email protected].

We are not accepting unsolicited resumes from employment agencies.

The ranges below reflect the target ranges for a new hire base salary. One is for the Bay Area (within 50 miles of HQ, Palo Alto), the second one (if applicable) is for elsewhere in the US (beyond 50 miles of HQ, Palo Alto). If there is only one range, it is for the specific location of where the position will be located. Actual compensation may vary outside of these ranges and is dependent on various factors including but not limited to a candidate's qualifications including relevant education and training, competencies, experience, geographic location, and business needs. Base pay is only one part of the total compensation package. Full time roles are eligible for equity and benefits. Base pay is subject to change and may be modified in the future.

U.S. Base Pay Range
$120,000$140,000 USD
Bay Area Pay Range
$145,000$165,000 USD

Top Skills

Ansible
AWS
Bash
Gitlab
Grafana
Kubernetes
Loki
Prometheus
Python
Tempo
Terraform
HQ

PsiQuantum Palo Alto, California, USA Office

700 Hansen Way, Palo Alto, California , United States, 94304

Similar Jobs

3 Days Ago
In-Office
8 Locations
185K-327K Annually
Senior level
185K-327K Annually
Senior level
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
The Embedded Site Reliability Engineer will develop and maintain software applications for Bitcoin mining, focusing on embedded systems and cloud observability. Responsibilities include software testing, bug triage, and collaboration with engineering teams to optimize performance and reliability.
Top Skills: CC++DatadogElasticGoGrafanaJavaScriptLinuxPythonRustSplunkSQLTypescript
3 Days Ago
In-Office
Costa Mesa, CA, USA
166K-220K Annually
Senior level
166K-220K Annually
Senior level
Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense
As a Senior Site Reliability Engineer, you will build and operate infrastructure, manage CI/CD pipelines, automate systems, and collaborate across teams for digital shipbuilding. Your role will emphasize security, reliability, and performance optimization in deploying machine learning models and applications.
Top Skills: AnsibleAWSAzureC++ConfluenceCudaDockerElk StackGithub ActionsGoogle Cloud PlatformGrafanaJfrog ArtifactoryJIRAKubeflowKubernetesMlflowOpenclPrometheusPythonTerraform
13 Days Ago
Remote or Hybrid
CA, USA
140K-215K Annually
Senior level
140K-215K Annually
Senior level
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
The Sr. Engineer will manage CI/CD systems, lead project administration, enforce best practices, and improve service reliability while mentoring teams.
Top Skills: Artifact Repository Services (ArtifactoryChefCi/Cd Tools (BazelGithub ActionsGithub)GitlabIac Provisioning Tools (AnsibleJenkins)NexusPuppetQuay.Io)Source Code Management (BitbucketTerraform)

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account