Voltage Park Logo

Voltage Park

Site Reliability Engineer

Sorry, this job was removed at 06:10 p.m. (PST) on Thursday, Mar 13, 2025
Remote
2 Locations
140K-180K Annually
Remote
2 Locations
140K-180K Annually

Voltage Park’s mission is to make AI infrastructure accessible to all. Today, we own 24,000+ H100s and operate 7+ data-centers across the US. We serve customers of all sizes, from small research labs to large enterprises. As part of this effort, we’re hiring a Site Reliability Engineer to be responsible for building out and operating our core infrastructure, including bare metal provisioning, telemetry, storage, and container / VM orchestration. 

To succeed in this role, you will need to be comfortable owning the care and feeding of thousands of GPU servers and related support infrastructure, including logging, analytics, automations, testing, and SOPs. You’ll play a pivotal role as a member of the team, responsible for bringing a substantial amount of infrastructure online across multiple data centers. You’ll also have an important role in defining the company’s culture and ensuring mission success.

This is a fully remote role, however some overlap with core PST work hours is required. You must be located in the United States, and we are unable to provide visa sponsorship at this time.

Responsibilities

  • At the direction of the Manager of Site Reliability Engineering, design, build, and roll out new platforms and patterns to minimize incidents and enable customer facing and internal features.

  • Deploy updates and improvements to support both Voltage Park’s internal and end customer use cases.

  • Collaborate with colleagues in network engineering, software development, and customer support in a flat organization.

  • Participate in the SRE on-call rotation (1 week on, 5+ weeks off).

Qualifications

  • 8+ years working with Linux as a server / hosting platform, extra points for Ubuntu experience.

  • 5+ years experience with AWS.

  • 2+ years experience with Kubernetes and strong container fundamentals.

  • 2+ years experience with Terraform and Ansible

  • 2+ years with network attached storage management (via NFS, ceph, or other protocols). Extra points for experience with VAST storage systems.

  • Experience working in a Slack-first, asynchronous remote work environment.

  • Experience with monitoring systems (Prometheus, ELK stack).

  • Familiarity with the gitops workflow. 

  • Software development experience using Python, Go, bash, or other languages for the purposes of automation & connecting systems & APIs together.

  • Deep networking fundamentals, extra points for experience with datacenter level networks, 400Gb ethernet, and Infiniband.

  • Experience architecting, building, and delivering complex systems from 0 to 1.

  • Adept at balancing pragmatic development and ideal architectures. Effective at navigating tradeoffs between design, risk, cost, and outcomes.

  • Comfortable with navigating ambiguity.

  • Strong written and oral communication.

Ideal Experiences

  • Experience with bare metal hardware troubleshooting and provisioning, extra points for working with Dell hardware.

  • Experience with GPU servers, both in bare metal form or under virtualization.

  • Deep experience with network switches, routers, and firewalls, particularly SONiC switches, Palo Alto firewalls. 

  • Experience with VAST storage systems.

Culture

  • You enjoy working with a small group of friendly, highly motivated, execution focused colleagues.

  • You’re comfortable with a high degree of autonomy. We expect you to independently prioritize your work and understand how it maps to the overall needs and goals of the company.

  • You’re knowledgeable in your domain but also enjoy wearing multiple hats and venturing outside of your comfort zone when the need arises.

  • You value the ability to write well and understand the importance of good documentation.

Voltage Park is an equal opportunity employer and makes employment decisions on the basis of merit. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic under federal, state, or local law. If you require an accommodation during the job application process, please notify your recruiter. 

Compensation Range: $140K - $180K


#BI-Remote

HQ

Voltage Park San Francisco, California, USA Office

555 Montgomery Street, San Francisco, CA, United States

Similar Jobs at Voltage Park

17 Days Ago
Remote
2 Locations
120K-180K Annually
Senior level
120K-180K Annually
Senior level
Artificial Intelligence • Cloud • Hardware • Machine Learning • Other • Software • Infrastructure as a Service (IaaS)
As a Platform Engineer at Voltage Park, you will maintain systems vital for platform reliability, develop software for automation, conduct root cause analyses on downtimes, and write scripts to monitor server performance. You'll be integral to shaping the team's engineering practices and company culture.
21 Days Ago
Remote
2 Locations
140K-165K Annually
Mid level
140K-165K Annually
Mid level
Artificial Intelligence • Cloud • Hardware • Machine Learning • Other • Software • Infrastructure as a Service (IaaS)
The Technical Account Manager ensures customer satisfaction by managing relationships, optimizing infrastructure usage, providing strategic insights, and collaborating with internal teams.
Top Skills: Advanced Data Analytics PlatformsAICloud InfrastructureHpcMl
2 Days Ago
Remote
USA
135K-160K Annually
Mid level
135K-160K Annually
Mid level
Artificial Intelligence • Cloud • Hardware • Machine Learning • Other • Software • Infrastructure as a Service (IaaS)
Oversee multiple data center deployment projects, manage stakeholders, resources, risks, and documentation while ensuring quality and compliance with industry standards.
Top Skills: AshraeData Center InfrastructureProject Management ToolsTia-942Uptime Institute Guidelines

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account