The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence.
As a Machine Learning and System Optimization Engineer, you will orchestrate and allocate overall system capacity to various core perception models running on-bot, as well as drive large initiatives that allow for more efficient inference by sharing various parts of the perception stack with one another.
You will focus on bringing highly efficient, production-ready large-scale models to our on-vehicle stack. We are looking for experts with hands-on experience compressing, accelerating, and deploying complex models, including LLMs, VLMs, or foundation models, for power- and thermal-constrained vehicle SoCs.
In addition, you will optimize ML models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge devices.
In this role, you will:
Allocate and distribute system resources (CPU/GPU/interconnect) to various models and inference engines running on the robot.
Spearhead cross-cutting initiatives that allow for better compute utilization through sharing/fusing models and better scheduling strategies.
Optimize large-scale models (Multi-Modal Sensor Fusion models, LLMs, VLMs) using advanced quantization (PTQ, QAT), pruning, mixed-precision inference frameworks, and parameter-efficient fine-tuning (LoRA, QLoRA).
Architect and implement model conversion and compilation pipelines using TensorRT for edge deployment.
Write production-level, low-latency, and memory-safe C++ and CUDA code for real-time inference on vehicle systems.
Qualifications:
Deep experience in system and performance optimization in CPU/GPU systems designed for low latency or high throughput.
Deep expertise in working with real-time systems & required constraints such as processing latency, memory utilization, and memory bandwidth pressure.
Deep expertise in model quantization (PTQ, QAT) and mixed-precision inference frameworks (INT8, FP8, FP4, BF16/FP16).
Proficiency in low-level programming for AI accelerators, specifically developing and optimizing custom ML OPs and TensorRT Plugins with efficient CUDA kernel implementations.
Production-level C++ (14/17/20) and Python programming skills, with experience developing concurrent, memory-safe, real-time inference code for edge devices.
Bonus Qualifications:
Prior experience in high-performance robotics applications such as AV/drones/robots.
Familiarity with SOTA autonomous driving perception algorithms (temporal 3D object detection, BEV, 3D Occupancy Networks) and multi-modal sensor processing (Vision, LiDAR, Radar).
Experience with end-to-end autonomous driving paradigms (VLM/VLA models, Foundation models) and edge deployment technologies (e.g., TensorRT-LLM).
Zoox Foster City, California, USA Office
4000 E 3rd Ave, Foster City, CA, United States, 94404
Zoox Foster City, California, USA Office
1149 Chess Drive, Foster City, CA, United States, 94404
Zoox Fremont, California, USA Office
47540 Kato Road, Fremont, CA, United States, 94538
Zoox San Francisco, California, USA Office
60 Broadway St, San Francisco, CA, United States, 94111
Similar Jobs
What you need to know about the San Francisco Tech Scene
Key Facts About San Francisco Tech
- Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Google, Apple, Salesforce, Meta
- Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
- Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
- Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine



