Voxel Jobs

Staff Software Engineer, ML Infrastructure

Voxel

Staff Software Engineer, ML Infrastructure

Reposted 4 Days Ago

Be an Early Applicant

Hybrid

San Francisco, CA, USA

220K-260K Annually

Senior level

Hybrid

San Francisco, CA, USA

220K-260K Annually

Senior level

The Staff Software Engineer will lead ML infrastructure, architect systems for model training and deployment, and mentor engineers on best practices.

The summary above was generated by AI

Who We Are

Voxel is building the future of Computer Vision and Machine Learning for operations, risk, and safety. We use computer vision and AI to enable existing security cameras to automatically detect hazards and high-risk activities, keep people safe and drive operational efficiencies. Our technology addresses the key cost drivers for workers’ compensation, general liability, and property damage, which cost US employers over $500 billion annually. Our customers include Fortune 500 companies across grocery, retail, manufacturing, food and beverage, logistics, and pharmaceutical distribution. We’ve passed $10M ARR with strong expansion revenue. Based in SF, backed by industry-leading VCs.

About the Role

Voxel’s perception system is the technical core of everything we ship. Our models detect human activity, equipment interactions, environmental hazards, and operational state in real time across thousands of cameras in manufacturing, logistics, retail, and pharmaceutical environments. Safety was our wedge; it proved our platform works. Now customers are pulling us into operations: equipment utilization, workflow compliance, process efficiency. Every new use case runs through the perception team.

We're hiring a Staff Software Engineer to own ML Infrastructure at Voxel. Our applied ML team is shipping vision models into production every week, across thousands of cameras at Fortune 500 customers, and the infrastructure underneath determines how fast we can move. You'll set the technical direction for how we train, track, and ship vision models, build the foundational systems that the applied ML team relies on, and shape the architectural decisions that will define our ML stack for the next several years.

This is a hands-on role. You'll write code, make architecture calls, and own outcomes end to end. You'll partner closely with applied CV engineers, the ML Data team, and the Platform team, and you'll be the technical voice in the room when ML infrastructure tradeoffs come up.

What You'll Do

Set the technical direction for ML infrastructure at Voxel: what we build, what we buy, and how the pieces fit together as the team and model portfolio scale
Architect and build the training infrastructure that lets the applied ML team run multiple experiments concurrently and iterate quickly on new architectures (PyTorch, AWS)
Own the train-to-deploy handoff: export trained models to optimized inference formats (TensorRT, ONNX), quantify accuracy and latency impact, and partner with Platform on production deployment
Pick and roll out the experiment tracking and lifecycle stack (Weights & Biases, MLflow, ClearML, or similar) so researchers can run, compare, and reproduce experiments efficiently
Establish DevOps-for-ML best practices (IaC, CI/CD, observability, cost monitoring) so researchers can iterate quickly and safely
Mentor engineers across Vision & AI on ML infrastructure best practices, raising the bar for how the org thinks about training, evaluation, and deployment
Anticipate where the infrastructure needs to be in 12 to 18 months, including the upcoming move to on-device inference, and architect for that future

What We're Looking For

7+ years building and shipping large-scale software systems, with at least 3 years focused on ML infrastructure or large-scale data infrastructure
A track record of being the person who decides the architecture, not just the person who implements it. You've owned tool selection, framework choices, and build-vs-buy calls for systems other engineers depend on
Deep fluency in PyTorch and the modern ML training stack. You know what good experiment tracking looks like, what makes a training pipeline reliable at scale, and where the failure modes live
Strong Python. Performant, maintainable code that holds up in production
A pragmatic shipping orientation. You can tell the difference between architectural decisions that need to be right and ones that can be revisited later, and you don't over-engineer the latter
Strong communication skills. You can explain complex tradeoffs clearly to ML researchers, infra peers, and leadership

Nice to Have

Production experience on AWS (S3, EC2, EKS, or similar) for ML workloads
Hands-on experience with model export and inference optimization (TensorRT, ONNX, or similar), including measuring accuracy and latency tradeoffs against training-time baselines
Experience with modern ML orchestration tools (Ray, Sematic, Flyte, Metaflow, Prefect, or similar)
Familiarity with GPU performance profiling and optimization (Nsight, PyTorch profiler, or similar)
Background in computer vision model training

Compensation & Benefits

Equity through Voxel’s Equity Incentive Plan
Total compensation includes base salary, annual bonus, and equity
Comprehensive health, dental, and vision insurance
Competitive paid parental leave
Unlimited PTO and flexible work arrangements
Daily meals in-office, team events, annual company onsite

Similar Jobs

Snap Inc.

Staff Software Engineer

9 Days Ago

Hybrid

Palo Alto, CA, USA

195K-343K Annually

Expert/Leader

195K-343K Annually

Expert/Leader

Artificial Intelligence • Cloud • Machine Learning • Mobile • Software • Virtual Reality • App development

Design, build, and optimize large-scale ML infrastructure: embedding generation, batch inference, data storage/compute, data management, quality systems, and production deployments with ML engineers to improve ranking and recommendation systems.

Top Skills: C++Embedding SystemsFeature StoreFlinkJavaPythonPyTorchRayScalaSparkTensorFlow

Nuro

Staff Software Engineer

19 Days Ago

In-Office

Mountain View, CA, USA

194K-352K Annually

Senior level

194K-352K Annually

Senior level

Artificial Intelligence • Automotive • Information Technology • Robotics

Design, build, and deploy ML infrastructure to optimize autonomy models using quantization, pruning, distillation, and model compilers. Maintain and profile GPU ML compilers/runtimes, collaborate with autonomy teams to validate and deploy optimized models to production vehicles.

Top Skills: C++CudaDistillationFtl (Model Compiler)Gpu RuntimesJaxKerasLarge Language ModelsMl CompilersModel CompressionPruningPythonPyTorchQuantizationTensorFlow

Heartflow

Staff Software Engineer

24 Days Ago

In-Office

San Francisco, CA, USA

190K-250K Annually

Senior level

190K-250K Annually

Senior level

Healthtech

Lead a small team building data and ML infrastructure for large-scale medical imaging ML. Architect and implement distributed compute platforms (training and inference), cloud-data systems, and production deployments. Mentor engineers, write high-performance Python code (including cross-language bindings), use frameworks like Ray and Kubernetes, apply infrastructure-as-code, and collaborate with researchers to enable scalable, secure ML workflows and monitoring.

Top Skills: Apache IcebergAWSAzureC++CdkGCPKubernetesPythonRayTerraform

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine