Bolt Graphics Jobs

Site Reliability Engineer

Bolt Graphics

Site Reliability Engineer

Reposted 16 Days Ago

In-Office

Sunnyvale, CA, USA

145K-175K Annually

Senior level

In-Office

Sunnyvale, CA, USA

145K-175K Annually

Senior level

The Site Reliability Engineer will design, implement, and manage reliable infrastructure and services, ensuring operational excellence and uptime.

The summary above was generated by AI

Bolt Graphics is a semiconductor startup based in Sunnyvale, CA building the fastest and most efficient graphics processors. We pride ourselves on our first principles approach to solving problems. We are energized by our mission to reduce the barrier of entry for content creation and consumption. Our goal is to enable everyone to easily create, simulate and consume immersive experiences as vividly as they can imagine them.

Our Values

Be Fearless: Unmute yourself. Test boundaries and get proven right.
Remain Adaptable: Stay comfortable in a continuously changing world. If you’re wrong, concede and move on.
Educate Your Ego: Selflessly collaborate towards our shared purpose.

About the role:

Bolt Graphics is seeking a highly experienced Site Reliability Engineer (SRE) to design, build, and operate highly reliable developer and production systems. This role is mission-critical to maintaining uptime, performance, and operational excellence across compute, storage, and networking environments. Exceptional Linux expertise and advanced automation capabilities are mandatory for success in this role.

What you'll do:

Design, implement, and operate highly available, fault-tolerant infrastructure and services.
Install, maintain, and upgrade server, storage, and networking hardware in office and colocation facilities.
Continuously monitor developer and production environments and proactively remediate reliability risks.
Participate in an on-call rotation and lead incident response efforts, including rapid triage, mitigation, and post-incident root cause analysis.
Respond effectively under pressure to outages and degradation events to restore service availability.
Develop, maintain, and continuously improve automation and operational tooling using Bash and Python.
Partner closely with engineering teams to support development, testing, and production workloads at scale.

Qualifications (required):

5-7 years' experience in managing SRE related functions
Expert-level Linux systems administration across complex, production environments (this is a core requirement).
Exceptional proficiency in Bash and Python; advanced scripting and automation skills are mandatory, not optional.
Proven ability to write maintainable automation and diagnostic tooling for large-scale systems.
Deep understanding of server hardware, storage subsystems, and datacenter operations.
Hands-on experience with virtualization platforms including Proxmox (current), VMware vSphere, and/or OpenShift.
Strong experience with containerization technologies (Docker, containerd) and orchestration platforms (Kubernetes).
Experience operating workloads in AWS and/or Microsoft Azure environments.
Experience implementing observability, monitoring, and alerting using tools such as Prometheus and Grafana.

Additional Qualifications:

Familiarity with systems programming languages such as C, C++, Rust, Go, and/or Julia.
Relevant certifications such as CompTIA A+, Azure Engineer, or similar are preferred.
Active government clearance or the ability to obtain one is required.

On-Call & Incident Response Expectations:

This role includes participation in an on-call rotation supporting developer and production systems. The SRE is expected to respond to incidents outside of normal business hours as required, lead technical incident response efforts, communicate effectively with stakeholders during outages, and produce clear post-incident documentation and corrective action plans.

Compensation Range: $145,000–$175,000 per year (California). This range represents the anticipated base pay for this role; the final offer may vary based on qualifications, experience, and location.

Benefits:

Medical, Dental, & Vision - 100% covered premiums
Equity - Stock Options
401(k) match
WFH Hardware

Bolt is committed to building a diverse and inclusive environment in which we recognize and value each other’s differences as well as fostering a culture that promotes its core values: Professionalism, Integrity, and Respect. As an equal opportunity employer, all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, genetic information, national origin, age, disability, or status as a protected veteran.

Please note that Bolt Graphics does not currently sponsor candidates for this role. This role is strictly based in Sunnyvale, CA and will require someone to be locally based, preferably in the Immediate Bay Area.

440 N Wolfe Rd, Sunnyvale, CA , United States, 94085

Similar Jobs

Domino Data Lab

Site Reliability Engineer

6 Days Ago

Easy Apply

Remote or Hybrid

Easy Apply

200K-230K Annually

Senior level

200K-230K Annually

Senior level

Artificial Intelligence • Machine Learning

Lead development of AI-assisted reliability tooling, own incident response end-to-end, improve observability and SLO/SLI frameworks, scale single-tenant SaaS operations, mentor engineers, and reduce recurring operational toil through engineering and automation.

Top Skills: Cloud PlatformsGoKubernetesLinuxLlm/Ai ToolingLogs And TracingObservability ToolingPythonSlo/Sli Frameworks

Uniphore

Site Reliability Engineer

10 Days Ago

In-Office

Palo Alto, CA, USA

233K-336K Annually

Expert/Leader

233K-336K Annually

Expert/Leader

Artificial Intelligence • Machine Learning

Lead platform reliability and automation at scale by building production Go services, Kubernetes operators, multi-cloud infrastructure, and self-service tooling. Provide technical leadership through architecture, code, on-call escalation ownership, incident remediation, and mentorship to elevate engineering teams' operational maturity.

Top Skills: AWSAzureController-RuntimeGCPGoKubernetesKubernetes OperatorTerraform

CrowdStrike

Site Reliability Engineer

13 Days Ago

Hybrid

Sunnyvale, CA, USA

140K-215K Annually

Expert/Leader

140K-215K Annually

Expert/Leader

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity

Lead and manage an SRE/Platform engineering team to ensure reliability, scalability, and performance of CrowdStrike's cloud-native security platform. Provide technical leadership, incident command, SLO-driven reliability, capacity planning, automation, and mentorship while collaborating with cross-functional teams.

Top Skills: Apache FlinkApache KafkaAWSAzureElkGCPGoGrafanaIstioJaegerKubernetesLinkerdOpentelemetryPrometheusSplunk

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Bolt Graphics

Site Reliability Engineer

Bolt Graphics Sunnyvale, California, USA Office

Similar Jobs

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech