Sunday (sunday.ai) Logo

Sunday (sunday.ai)

ML Infrastructure Engineer

Reposted Yesterday
In-Office
Redwood City, CA, USA
Mid level
In-Office
Redwood City, CA, USA
Mid level
Build and maintain ML infrastructure for robot manipulation, including data pipelines, training systems, and real-time inference, collaborating with researchers.
The summary above was generated by AI

Join Us in Building the Future of Home Robotics

At Sunday, we're developing personal robots to reclaim the hours lost to repetitive tasks. We're focused on an ambitious goal to make generalized robots broadly accessible, enabling households to take back quality time.

We have spent the last 18 months building a talented team, securing capital, and validating our technology. We are now seeking passionate individuals to join us in the next phase of our growth. If you are ready to apply your skills to the forefront of robotics innovation, we’d love to hear from you.

The Role

Sunday Robotics is building the future of home robotics. We're developing end-to-end ML models for robot manipulation, and you'll have the opportunity to build and shape foundational systems that directly accelerate our path to putting robots in homes.

This is a broad role that can be tailored to your specific area of expertise: data pipelines, training infrastructure or inference. You'll build systems across the full robot learning pipeline: ingesting and processing multimodal data, scaling distributed training, optimizing inference for real-time control and building research tooling.

What You'll Do

Training and Inference Infrastructure

  • Maintain an effective research codebase with good ergonomics, optimizing for fast iteration and correctness

  • Own infrastructure for model training: job scheduling, checkpointing, metrics, and logging

  • Scale distributed training across GPU clusters with minimal researcher friction

  • Enable training of larger models through sharding, activation checkpointing and memory optimization

  • Profile and optimize gpu utilization, memory usage and training throughput

  • Build low-latency inference pipeline for real-time robot control, apply quantization, distillation and model compilation to optimize inference performance

  • Work closely with researchers and roboticists to translate research needs into reliable software and infrastructure

Data Pipelines and Research Tooling

  • Design high-throughput pipelines for ingesting, validating, and transforming multimodal robot data (video, proprioception, actions)

  • Build storage systems and metadata indexing for efficient dataset management at large scale

  • Optimize dataloaders, sharding and prefetching to minimize time from data arrival to model training

  • Build research tooling for debugging, visualization and experiment analysis

What We're Looking For

  • Strong software engineering and systems fundamentals

  • Experience building distributed systems or large-scale data pipelines

  • Hands-on experience with ML training infrastructure, ideally PyTorch

  • Comfort reasoning about performance, memory, I/O, and GPU utilization

  • Experience managing training workloads (SLURM, Kubernetes, or similar)

  • Ownership mindset: you design, build, operate, and iterate on systems end-to-end

  • Enjoy working closely with researchers and unblocking fast-moving projects

Nice to Have

  • Experience with robotics data pipelines or multimodal models

  • Background in VLAs, Video Generation architectures or robot learning systems

  • Deep ML systems experience: training compilers, custom kernels, runtime optimization

  • Hands-on GPU performance tuning

  • Experience with serialization formats for high-performance systems (Protobuf, FlatBuffers, MCAP)

At Sunday Robotics, we’re building technology shaped by real people — curious, creative, and diverse. We’re proud to be an equal opportunity employer and consider all qualified applicants regardless of race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.

Even if you don’t meet every single requirement, we encourage you to apply. Studies show that women and underrepresented groups often hold back unless they meet 100% of the criteria — we don’t want that to be the reason we miss out on great talent.

Similar Jobs

14 Days Ago
Remote or Hybrid
2 Locations
185K-335K Annually
Senior level
185K-335K Annually
Senior level
Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Lead design and development of scalable, high-performance ML training infrastructure. Drive distributed training performance optimization, observability, and developer experience. Own cross-functional infrastructure initiatives, set technical direction and standards, and mentor engineers to deliver platform capabilities that support large-scale model training.
Top Skills: AWSAzureDistributed TrainingFsdpGCPGpu ComputingPipeline ParallelismPythonPytorch 2.XTensorFlow
8 Days Ago
Hybrid
Palo Alto, CA, USA
133K-235K Annually
Junior
133K-235K Annually
Junior
Artificial Intelligence • Cloud • Machine Learning • Mobile • Software • Virtual Reality • App development
The Software Engineer will optimize ML infrastructure for training and inference, develop scalable systems, and work closely with ML engineers on producing high-performance models.
Top Skills: C++Caffe2FlinkJavaPythonPyTorchRayScalaScikit-LearnSparkSpark MlTensorFlow
4 Days Ago
In-Office
San Francisco, CA, USA
190K-300K Annually
Mid level
190K-300K Annually
Mid level
eCommerce • Mobile • Retail
The role involves developing ML systems, designing low-latency infrastructure, and collaborating on AI/ML initiatives while ensuring reliability and performance at scale.
Top Skills: Apache KafkaAws Ec2Aws EcsAws EksAws KinesisAws LambdaAws S3Aws SagemakerDatadogDynamoDBElasticsearchFlinkGrafanaPostgresPythonRedis

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account