Causal Labs

Machine Learning - Infrastructure

Reposted 15 Days Ago

Be an Early Applicant

In-Office

San Francisco, CA, USA

Mid level

In-Office

San Francisco, CA, USA

Mid level

Design and maintain distributed ML training clusters, develop scalable pipelines for large datasets, and optimize performance for ML workloads.

The summary above was generated by AI

Our mission is general causal intelligence, AI that is capable of (1) predicting the future and (2) identifying the optimal actions to change that future.

To achieve this breakthrough, we are building a Large Physics foundation Model (LPM) because domains governed by physics have inherent cause and effect relationships, unlike visual or textual data.

Weather is the ideal training ground for an LPM. It is the most well-observed physical system, offering rapid, objective ground truth feedback from sensory observations and data at a scale that dwarfs what is used to train today’s LLMs.

Causal Labs is a team of researchers and engineers from self-driving, drug discovery, and robotics - including Google DeepMind, Cruise, Waymo, Meta, Nabla Bio, and Apple - who believe general causal intelligence will be the most important technical breakthrough for civilization.

We look for infrastructure engineers who are excited to tackle unsolved problems.

Our training and inference challenges demand deep expertise in setting up distributed training clusters and optimizing performance for large models. If you have experience building large-scale ML infrastructure in related fields such as language and vision models, robotics, biology -- join us on this mission.

Responsibilities

Design, deploy, and maintain large distributed ML training and inference clusters
Develop efficient, scalable end-to-end pipelines to manage petabyte-scale datasets and model training throughout the entire ML lifecycle
Research and test various training approaches including parallelization techniques and numerical precision trade-offs across different model scales
Analyze, profile and debug low-level GPU operations to optimize performance
Stay up-to-date on research to bring new ideas to work

What we’re looking for

We value a relentless approach to problem-solving, rapid execution, and the ability to quickly learn in unfamiliar domains.

Strong grasp of state-of-the-art techniques for optimizing training and inference workloads
Demonstrated proficiency with distributed training frameworks (e.g. FSDP, DeepSpeed) to train large foundation models
Knowledge of cloud platforms (GCP, AWS, or Azure) and their ML/AI service offerings
Familiarity with containerization and orchestration frameworks (e.g., Kubernetes, Docker)
Background working on distributed task management systems and scalable model serving & deployment architectures
Understanding of monitoring, logging, observability, and version control best practices for ML systems

You don’t have to meet every single requirement above.

Top Skills

AWS

Azure

Docker

GCP

Kubernetes

Ml Training Frameworks

San Francisco, CA, United States

Similar Jobs

GRAIL

Infrastructure Engineer

4 Days Ago

Hybrid

Menlo Park, CA, USA

190K-255K Annually

Senior level

190K-255K Annually

Senior level

Artificial Intelligence • Big Data • Healthtech • Machine Learning • Software • Biotech

The role focuses on building and supporting machine learning infrastructure for cancer detection research, empowering teams by enhancing their computational capabilities and ensuring software quality and system efficiency.

Top Skills: AWSBazelBeamC#C++DockerFlinkGoJavaJupyterNumpyPythonPyTorchR NotebookRaySparkTensorFlow

General Motors

Infrastructure Engineer

14 Days Ago

Remote or Hybrid

Sunnyvale, CA, USA

189K-291K Annually

Senior level

189K-291K Annually

Senior level

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing

As a Staff ML Infra Engineer, you will develop and deploy offboard machine learning solutions for autonomous vehicles, ensuring model integration and performance across teams. You'll build ML infrastructure, implement CI/CD pipelines, support data curation, and mentor engineers.

Top Skills: Ci/CdDockerKubernetesNumpyPythonPyTorch

General Motors

Infrastructure Engineer

17 Days Ago

Hybrid

155K-206K Annually

Senior level

155K-206K Annually

Senior level

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing

As a Senior ML Infrastructure Engineer, you'll design and build scalable platforms for ML inference workflows, collaborating with teams to optimize model serving and enhance system reliability.

Top Skills: C++GpusPythonRayserveTritonVllm

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Causal Labs

Machine Learning - Infrastructure

Top Skills

Causal Labs San Francisco, California, USA Office

Similar Jobs

Infrastructure Engineer

Infrastructure Engineer

Infrastructure Engineer

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech