Rhoda AI Jobs

Inference Optimization ML Engineer

Rhoda AI

Inference Optimization ML Engineer

Posted Yesterday

In-Office

Mountain View, CA, USA

Mid level

In-Office

Mountain View, CA, USA

Mid level

Optimize inference performance of large multimodal foundation models across cloud and on-robot targets. Diagnose bottlenecks, apply quantization/pruning/distillation, tune kernels (CUDA/Triton), build benchmarking and regression detection, and translate research models into deployment-ready implementations.

The summary above was generated by AI

At Rhoda AI, we’re building the next generation of generalist intelligent robots. We own the full robotics stack from high-performance hardware and robot systems to the infrastructure and state-of-the-art foundation world models that control our robots. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling long-tail edge cases, made possible by our cutting edge research and end-to-end system design. We've raised over $400M and are investing aggressively in model research, infrastructure, hardware development, and manufacturing scale-up to make generalist robotics a reality.

We're looking for an Inference Optimization MLE to help build and operate the systems that make our foundation models run fast and efficiently in production. You'll be responsible for squeezing maximum performance out of large multimodal models, across cloud and on-robot deployment targets. You will working closely with research and robotics teams to close the gap between training and real-world deployment.

What You'll Do

Own inference performance end-to-end — diagnose and improve latency, throughput, and efficiency of large foundation models in production
Build systematic performance attribution: latency decomposition (compute vs. memory bandwidth vs. I/O), bottleneck identification, and prioritization across model families
Apply and develop optimization techniques including quantization, pruning, distillation, operator fusion, and model compilation (e.g., TensorRT, torch.compile, XLA)
Optimize attention mechanisms, KV caching, and memory layouts for large multimodal models (vision, video, language, proprioception)
Work with kernel-level tooling (e.g., CUDA, Triton) to identify hotspots and implement or tune custom kernels where needed
Build benchmarking and regression detection infrastructure: latency baselines, throughput curves, and automated detection of performance regressions across model versions
Collaborate closely with research engineers to translate model innovations into optimized, deployment-ready implementations

What We're Looking For

3+ years of experience in inference optimization, ML systems, or a closely related field
Deep hands-on experience with modern ML stacks (PyTorch required; JAX a plus)
Strong understanding of compute, memory bandwidth, and I/O bottlenecks in large model inference
Experience with model optimization techniques: quantization (INT8/FP8/AWQ), distillation, pruning, and compilation
Familiarity with inference serving frameworks (e.g., Triton, TensorRT, vLLM, TorchServe)
Exceptional debugging and measurement ability: turn "inference is slow" into clear bottlenecks, experiments, and validated improvements
High ownership mindset and comfort in a fast-moving environment

Nice to Have (But Not Required)

GPU kernel or compiler-level experience (CUDA, Triton, graph capture, operator fusion)
Experience with multimodal or video model inference (variable-length sequences, packing/bucketing)
Familiarity with edge/cloud hybrid deployment patterns and on-robot inference constraints
Experience with speculative decoding, continuous batching, or other LLM serving optimizations
Background in streaming or low-latency systems relevant to real-time robot control

Why This Role

Direct leverage on research velocity and real-world robot performance — every efficiency gain you make accelerates model iteration and tightens the loop between model and robot behavior
Own the optimization layer that determines how quickly and efficiently our foundation models run in the real world — high ownership, high impact, small elite team

Similar Jobs

Unity

Machine Learning Engineer

5 Days Ago

Hybrid

Mountain View, CA, USA

278K-348K Annually

Senior level

278K-348K Annually

Senior level

AdTech • Artificial Intelligence • Gaming • Machine Learning • Software • Virtual Reality • Metaverse

The Principal Machine Learning Engineer leads the deployment of multi-modal AI models on mobile platforms, setting technical vision and optimizing inference performance, while mentoring a team and collaborating cross-functionally.

Top Skills: C++CoremlExecutorchObjective-COnnx RuntimePythonSwiftTflite

Chime

Product Manager

46 Minutes Ago

Easy Apply

Hybrid

San Francisco, CA, USA

Easy Apply

176K-244K Annually

Senior level

176K-244K Annually

Senior level

Fintech • Machine Learning • Mobile • Security • Software

Lead vision, strategy, and roadmap for AI-driven app experiences (0→1). Partner with engineering, data science, design, and research to build, launch, and iterate personalized, trustworthy AI features, define success metrics, run experiments, and ensure product-market fit and broad integration across the app.

Top Skills: AIAnalyticsData ScienceExperimentationMlMobile AppsWeb Apps

HRL Laboratories

Test Engineer

46 Minutes Ago

Hybrid

141K-176K Annually

Senior level

141K-176K Annually

Senior level

Artificial Intelligence • Hardware • Software • Nanotechnology • Semiconductor • Quantum Computing • Defense

Develop and test RF and millimeter-wave components and subsystems. Responsibilities include schematic/layout generation, RF/mmW circuit and device modeling, on- and off-wafer RF testing, maintaining PDKs, and RF system design and integration in a fast-paced team environment.

Top Skills: Ansys HfssAwr Microwave Office (Mwo)Cadence VirtuosoCstIii-VKeysight AdsMmicProcess Design Kit (Pdk)Rf TestingWirebonding

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine