Eventual Jobs

Research Engineer, Multimodal Data

Eventual

Research Engineer, Multimodal Data

Reposted 17 Hours Ago

In-Office

San Francisco, CA, USA

150K-250K Annually

Mid level

In-Office

San Francisco, CA, USA

150K-250K Annually

Mid level

As a Research Engineer, you'll enhance visual understanding capabilities by training and deploying models to make video datasets queryable, driving down costs, and collaborating on data taxonomy design.

The summary above was generated by AI

About Eventual

Every breakthrough Physical AI system — humanoid robots, autonomous vehicles, video generation models — is trained on petabytes of video, lidar, radar, and sensor data. But today's data platforms (Databricks, Snowflake) were built for spreadsheet-like analytics, not the multimodal corpora that power AI. As a result, robotics and video-AI teams iterate on model improvement about once a week. Most of that week isn't training — it's finding the right data: writing CV heuristics over raw footage, paying annotators for edge cases, hand-curating clips before a cluster ever spins up. GPU bandwidth has grown 2-3× per generation. Storage and pipelines haven't. The gap widens every year.

Eventual was founded in 2022 to close it. Our open-source engine, Daft, is the distributed data engine purpose-built for multimodal AI — already running 2 PB/day at Amazon, 60-100 PB at another FAANG company, and in production at Mobileye, TogetherAI, and CloudKitchens. We are building a video-native index on top of our engine for Physical AI that collapses the data iteration loop. Describe the dataset you want, get a curated table in minutes, feed it to your GPUs at line rate. One iteration per day becomes the norm.

We're building this in partnership with the top PhysicalAI labs and public AI infrastructure companies today. We have raised $30M from Felicis, CRV, Microsoft M12, Citi, Essence, Y Combinator, Caffeinated Capital, Array.vc, and angels from the co-founders of Databricks and Perplexity. We've assembled a world-class team from AWS, Render, Pinecone and Tesla. We have spent our careers powering the last generation of PhysicalAI in self-driving, and are excited to now do this for the next.

Join our small (but powerful!) team working together 4 days/week in our SF Mission district office.

Your Role

As a Research Engineer on the Visual Understanding team, you'll own the layer that makes petabytes of video queryable by content. Physical AI teams have raw footage, lidar, radar, and sim outputs scattered across object stores with no way to find what they need without weeks of human annotation. We change that economics: we run vision-language models over every clip in a corpus along axes the customer cares about (gripper type, failure mode, object class, scene, motion density), so a researcher can ask "left-arm grasp failures on deformable objects" and get a curated dataset in minutes.

You'll define the roadmap for our visual understanding capabilities, train and select the models that make corpus-scale annotation tractable at single-digit cents per hour of video, and build the rich datasets that go on to train customer models. This is a research engineering role — meaning you'll read papers and run experiments, but you ship to production and your work is judged by what it does for customer training runs.

Key Responsibilities

Own the visual understanding roadmap end-to-end: from picking the model family for a customer's taxonomy to landing it in production inference at corpus scale.
Train, fine-tune, and evaluate VLMs, VQA models, embedding models, and convolutional perception models against customer datasets and benchmarks.
Drive down per-clip annotation cost — model selection, distillation, batching, decode pipelining — so "annotate every clip in a 10K-hour corpus" stays economical.
Build the rich, queryable datasets that customers train on: design taxonomies with researchers, instrument quality, version the outputs.
Partner with the dataloading and storage teams so visual understanding outputs flow into the index and on to the GPU without re-engineering.
Work directly with researchers at our partner labs — your shortest feedback loop is their next training iteration.

What we look for

Strong familiarity with modern vision and multimodal models — convolution nets, VLMs, VQA, embeddings — and a sense for the SOTA that's actually deployable today vs. on a leaderboard.
Experience running these models at scale on real video and sensor data, ideally for perception tasks (detection, tracking, segmentation, retrieval, captioning).
Background from a perception team at a self-driving, robotics, or visual-data company — or equivalent depth from a research lab.
Comfortable with cloud infrastructure and large-scale data processing — you don't need to be a distributed-systems engineer, but you've shipped jobs that ran on thousands of GPU-hours of video.
Bias toward data and infrastructure: you reach for "annotate the whole corpus" before "fine-tune another model."

Nice to have

Experience training vision or multimodal models from scratch (not just calling APIs).
ML/AI research background — papers, citations, or a research org on your resume.
Hands-on time with big-data frameworks like Spark, Ray, or Daft.
Worked on embeddings, retrieval, or content-aware search at scale.
Experience designing labeling taxonomies or running annotation programs.

Perks & Benefits

In-person, tight-knit team — 4 days/week in our SF Mission office.
Competitive comp and meaningful startup equity.
Catered lunches and dinners for SF employees.
Commuter benefit.
Team-building events and poker nights.
Health, vision, and dental coverage.
Flexible PTO.
Latest Apple equipment.
401(k) plan with match.

If you're excited about being on the team that turns petabytes of raw video into the training data for the next generation of Physical AI, we'd love to talk.

2 Embarcadero Center, San Francisco, California, United States, 94111

Similar Jobs

World Labs

Research Engineer (Scaling Multimodal Data)

18 Days Ago

In-Office

San Francisco, CA, USA

200K-325K Annually

Mid level

200K-325K Annually

Mid level

3D Printing • Artificial Intelligence • Information Technology • Software

The Research Engineer will enhance world models through multimodal data by building datasets, processing systems, and conducting experiments to improve model quality. Responsibilities include data acquisition, pipeline creation, and ensuring data's impact on model performance.

Top Skills: Apache BeamFfmpegKubernetesOpencvPilPyavRaySpark

Consultant

37 Minutes Ago

Hybrid

South San Francisco, CA, USA

160K-177K Annually

Senior level

160K-177K Annually

Senior level

Artificial Intelligence • Healthtech • Professional Services • Analytics • Consulting

The Decision Analytics Consultant will utilize quantitative skills to analyze data, develop insights for clients in retail and operations, and manage complex projects. The role demands strong communication skills and expertise in analytics.

Top Skills: RSASTableauVBA

Edmunds

Video Host / Script Writer

50 Minutes Ago

Hybrid

98K-130K Annually

Mid level

98K-130K Annually

Mid level

AdTech • Automotive • Big Data • Consumer Web

Host and scriptwrite automotive video content for Edmunds and CarMax across YouTube and social channels. On-screen talent duties include driving and evaluating test vehicles, researching and writing scripts, fact-checking, collaborating with video/social/test teams, generating social assets, managing assignments in Monday.com, attending events, and producing companion written content. Requires regular driving and occasional travel, commuting to Santa Monica.

Top Skills: Monday.ComYoutube

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Eventual

Research Engineer, Multimodal Data

Eventual San Francisco, California, USA Office

Similar Jobs

Research Engineer (Scaling Multimodal Data)

Consultant

Video Host / Script Writer

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech