Build and operate end-to-end ML infrastructure for autonomy: training, evaluation, deployment, monitoring, CI/CD, model observability, GPU optimization, and edge deployments, working closely with perception and autonomy teams to productionize models.
Teleo is a robotics startup disrupting a trillion-dollar industry. Teleo converts construction heavy equipment, like loaders, dozers, excavators, trucks, etc. into autonomous robots. This technology allows a single operator to efficiently control multiple machines simultaneously, delivering substantial benefits to our customers while significantly enhancing operator safety and comfort.
Teleo is founded by Vinay Shet and Rom Clément, experienced technology executives who led the development of Lyft’s Self Driving Car and Google Street View. Teleo is backed by YCombinator, Up Partners, F-Prime Capital, and a host of industry luminaries. Teleo’s product is already deployed on several continents and generating revenue.
Teleo is poised for rapid growth. This presents a unique opportunity to be part of a team that is creating a product with a profound impact on our customers, working on cutting-edge 100,000-pound autonomous robots, engineering intricate systems at the intersection of hardware, software, and AI, and joining the early stages of an exciting startup journey.
About the Role
Own the reliability, scalability, and velocity of model training and deployment for autonomy systems. Turn experimental models into dependable production services.
Core Responsibilities
- Design and operate end-to-end ML infrastructure: training, evaluation, deployment, monitoring
- Build CI/CD for ML (model versioning, promotion, rollback, canarying)
- Own model observability: drift detection, performance regression, data health
- Optimize GPU utilization across training and inference (on-prem + cloud)
- Support edge deployment (Jetson / Orin / x86 + GPU)
- Work closely with perception and autonomy teams to reduce friction from research to production
Required Qualifications
- 2+ years in MLOps / Infra / ML Platform
- Deep experience with PyTorch, CUDA-aware workflows
- Strong Linux + systems fundamentals
- Proven experience deploying models at scale (not just notebooks)
Preferred Qualification
- Training orchestration: Ray, Slurm, Kubernetes, Airflow
- Model lifecycle: Weights & Biases, MLflow, custom registries
- Containers: Docker, multi-arch builds
- Inference optimization: TensorRT, ONNX, Triton
- Monitoring: metrics, logs, alerts for ML systems
Bonus Points
- Experience with autonomy or robotics
- Edge deployment constraints (latency, power, thermal)
- Data versioning tools (DVC, LakeFS)
Teleo is an equal opportunity employer and we value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. All qualified people are encouraged to apply.
Top Skills
Pytorch,Cuda,Linux,Ray,Slurm,Kubernetes,Airflow,Weights & Biases,Mlflow,Docker,Tensorrt,Onnx,Triton,Jetson,Orin,Dvc,Lakefs
Teleo Palo Alto, California, USA Office
Palo Alto, Palo Alto, CA, United States, 94306
Similar Jobs
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Lead design, deployment, and operation of large-scale GPU-accelerated HPC clusters. Build scalable automation and tooling, support researchers, analyze and optimize performance, perform root-cause investigations, and collaborate cross-functionally to maintain compute, networking, and storage systems.
Top Skills:
Slurm,Kubernetes,Lsf,Mpi,Nccl,Centos,Rhel,Ubuntu,Enroot,Docker,Podman,Python,Bash,Go,Golang,Rust,C,C++,Nvidia Gpus,Cuda,Mlperf,Pytorch,Megatronlm,Tensorflow,Infiniband,Rdma,Roce,Amazon Efa,Lustre,Gpfs,Prometheus,Opensearch,Grafana
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Build and maintain CI/CD pipelines and release processes for Megatron-LM and NeMo; implement scalable DevOps solutions, manage clusters and servers, automate regression detection, and collaborate with DL framework and infrastructure teams to optimize performance and quality.
Top Skills:
Python,Shell Scripting,Bash,Kubernetes,Docker,Slurm,Ansible,Gitlab,Github Actions,Jenkins,Artifactory,Jira,Cuda,Cudnn,Cublas,Pytorch,Make,Cmake,Git,Perforce,Linux,Megatron-Lm,Nemo Framework,Lustre
Artificial Intelligence • Big Data • Information Technology • Machine Learning • Natural Language Processing • Generative AI
As a Senior Software Engineer at Galileo, you'll design and scale AI/ML products, collaborate with teams, and uphold engineering quality standards.
Top Skills:
AWSCassandraCeleryClickhouseCrewaiDockerFastapiGCPKafkaKerasKubernetesLangchainAzureMongoDBOpenaiPinotPostgresPythonPyTorchRabbitMQS3TensorFlow
What you need to know about the San Francisco Tech Scene
San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.
Key Facts About San Francisco Tech
- Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Google, Apple, Salesforce, Meta
- Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
- Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
- Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine


