Voxel Logo

Voxel

Senior/Staff Software Engineer - ML Infrastructure

Sorry, this job was removed at 12:18 a.m. (PST) on Wednesday, Feb 04, 2026
In-Office
San Francisco, CA
In-Office
San Francisco, CA

Similar Jobs

10 Days Ago
In-Office
Mountain View, CA, USA
184K-334K Annually
Mid level
184K-334K Annually
Mid level
Artificial Intelligence • Automotive • Information Technology • Robotics
Design, build, and deploy ML infrastructure components for autonomous vehicles, focusing on model training pipelines and optimizations.
Top Skills: C++CudaJaxKerasPythonPyTorchTensorFlow
4 Days Ago
In-Office
Mountain View, CA, USA
184K-276K Annually
Senior level
184K-276K Annually
Senior level
Artificial Intelligence • Automotive • Information Technology • Robotics
Design and develop scalable data pipelines and storage systems for machine learning evaluation, and create tools for data annotation and monitoring.
Top Skills: BigQueryC++GCPGcsPostgresPython
49 Minutes Ago
In-Office
San Francisco, CA, USA
139K-174K Annually
Senior level
139K-174K Annually
Senior level
Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
Lead the design and development of scalable storage solutions, support distributed systems, and contribute to open source projects while ensuring system performance and availability.
Top Skills: CephDockerGoKubernetesLinuxS3

Who Are We

Industrial labor is incredibly dangerous work - almost 3 million people in the US per year are injured in the workplace for entirely preventable and at times, fatal or debilitating causes. Protecting these essential people who power our world is what motivates Voxelitos, and we'd love for you to join us. At Voxel, we're passionate about revolutionizing workplace safety and operations with groundbreaking, full-stack AI and computer vision technology.


Voxel’s site intelligence platform helps safety and operations leaders see the unseen risks, make strategic decisions, and prevent workplace incidents before they happen. Our customers include Fortune 500 companies across major grocers and retailers, manufacturers, food and beverage warehousers, supply chain and logistics service providers. Based in SF with team members sitting all over the globe, Voxel is backed by industry leading VC’s.




Voxel is looking for a Staff Machine-Learning Infrastructure Engineer to drive the next wave of our computer-vision platform for workplace safety. You will be the technical owner for three pillars of our ML lifecycle — ground-truth data & labeling workflows, large-scale training infrastructure, and continuous model lifecycle management. If you excel at designing cloud-native, distributed systems that turn raw video into production-ready, version-controlled models, we’d love to meet you.

What You'll Do

  • Own data & labeling pipelines – architect scalable labeling services (storage, query, retrieval), design ontologies, automate annotation workflows, and build quality-tiered datasets that stay within cost constraints.

  • Build and operate training infrastructure – create multi-GPU / multi-node training frameworks (Ray, Spark, Kubernetes), optimize distributed jobs, and integrate accelerators (TensorRT, CUDA-graph, FP8, etc.).

  • Manage the full model lifecycle – stand up model registries, version control, evaluation suites, and continuous-learning loops that push updates from dev → staging → prod with zero-downtime rollbacks.

  • Provide technical leadership, mentorship, and lightweight project management to a small infra + research squad.

  • Establish DevOps-for-ML best practices (IaC, CI/CD, observability, cost monitoring) so researchers can iterate quickly and safely.

  • Partner with ML engineers on architecture decisions, from data schemas to inference optimizations, ensuring infra and research road-maps stay tightly aligned.

Qualifications (Must Haves)

  • Bachelor’s (or higher) in Computer Science, EE, or related field.

  • 5+ years building and operating large-scale infrastructure, with at least 3 years focused on ML or data-intensive systems.

  • Proven record designing highly available, distributed systems on Kubernetes (EKS, GKE, or on-prem).

  • Deep expertise with orchestration (K8s operators, Argo, Kubeflow), and cluster-scale storage / compute (S3, GCS, Ray, Spark, Dask).

  • Hands-on experience automating data-labeling or ground-truth workflows and maintaining dataset versioning.

  • Strong software-engineering fundamentals; familiar with best practices for testing, observability, and secure coding.

  • Demonstrated DevOps mindset — IaC (Terraform/CDK), CI/CD (GitHub Actions, ArgoCD), metrics & alerting (Prometheus/Grafana).

Nice-to-Haves

  • Experience running multi-instance / multi-GPU training jobs, mixed-precision optimizations, or TensorRT / Triton inference.

  • Familiarity with active-learning, continuous-training, or online distillation pipelines.

  • Background in model registry tooling (MLflow, BentoML, SageMaker Registry) and evaluation dashboards.

  • Prior work with computer-vision models (YOLO, DETR, Faster RCNN) or video understanding at scale.

  • Contributions to open-source ML infra projects or published talks/blogs on MLOps.

  • Exposure to edge-deployment or real-time inference systems.

  • Experience shipping high quality production code in Python

Why Join Us?

Join a visionary team revolutionizing safety and operations, directly impacting the well-being of millions of essential workers. This is your chance to build an extraordinary business and foster a vibrant company culture that demands your absolute best. Alongside AI experts, experienced entrepreneurs, and passionate problem-solvers, you'll play a pivotal role in shaping the company's growth trajectory and market position. Enjoy a competitive salary, benefits, and a dynamic work environment.


Benefits:

Extensive / Generous health, dental, and vision insurance.

Highly competitive paid parental leave and support system.

Ownership in the business through an Equity Incentive Plan.

Generous paid time off and / or flexible work arrangements.

Daily meals in-office, vibrant company events, team-building.

401K retirement plan, HSA options, pre-tax Commuter Card.

HQ

Voxel San Francisco, California, USA Office

425 2nd St, San Francisco, California, United States, 94107

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account