UniversalAGI Logo

UniversalAGI

Head of ML Cloud Platform

Posted 16 Days Ago
Be an Early Applicant
In-Office
San Francisco, CA
Senior level
In-Office
San Francisco, CA
Senior level
The Head of ML Cloud Platform will lead the architecture for AI-powered physics simulation infrastructure, manage technical teams, and engage directly with customers to ensure effective deployment and integration of ML models.
The summary above was generated by AI

📍 San Francisco | Work Directly with CEO & founding team | Report to CEO | OpenAI for Physics | 🏢 5 Days Onsite

Head of ML Cloud Platform

📍 San Francisco | Work Directly with CEO & Founding Team | Report to CEO | OpenAI for Physics | 🏢 5 Days Onsite

Location: Onsite in San Francisco
Compensation: Competitive Salary + Significant Equity

Who We Are

UniversalAGI is building OpenAI for Physics. AI startup based in San Francisco and backed by Elad Gil (#1 Solo VC), Eric Schmidt (former Google CEO), Prith Banerjee (ANSYS CTO), Ion Stoica (Databricks Founder), Jared Kushner (former Senior Advisor to the President), David Patterson (Turing Award Winner), and Luis Videgaray (former Foreign and Finance Minister of Mexico). We're building foundation AI models for physics that enable end-to-end industrial automation from initial design through optimization, validation, and production.

We're building a high-velocity team of relentless researchers and engineers that will define the next generation of AI for industrial engineering. If you're passionate about AI, physics, or the future of industrial innovation, we want to hear from you.

About the Role

As the Head of ML Cloud Platform, you'll be in the arena from day one, building and leading the team that creates the backbone for AI-powered physics simulation at scale. This is your chance to own the entire ML infrastructure vision—from training foundation models on petabytes of CFD data to deploying them into mission-critical automotive and maritime production environments.

You'll work directly with the CEO and founding team to build a world-class ML platform organization, recruiting exceptional engineers and researchers while remaining deeply technical yourself. You'll architect systems that train models faster, serve predictions with lower latency, and integrate seamlessly into customers' existing CAE workflows—all while managing a team that ships with the velocity of a startup and the rigor of enterprise infrastructure.

This isn't a pure management role. You're a technical leader who codes, debugs production incidents at 2 AM when needed, and earns respect through hands-on contribution while simultaneously building the team and culture that will scale our platform to serve the world's largest industrial companies.

What You'll DoTechnical Leadership & Architecture
  • Define the ML platform vision: Architect the end-to-end infrastructure strategy for training, fine-tuning, serving, and deploying foundation models for physics simulation across cloud and on-premise environments

  • Build for scale and reliability: Design systems that can handle petabyte-scale CFD datasets, multi-day distributed training runs, and real-time inference for customers making million-dollar engineering decisions

  • Stay hands-on: Write code, debug critical production issues, review pull requests, and make key architectural decisions yourself—you're a technical leader who leads by doing

  • Bridge research and production: Translate cutting-edge research from our deep learning team into production-grade infrastructure that customers can depend on

  • Integrate with CAE ecosystems: Ensure our platform works seamlessly with existing simulation tools (Ansys, OpenFOAM, STAR-CCM+), HPC clusters, PLM systems, and enterprise security requirements

Team Building & Management
  • Recruit world-class talent: Build a team of exceptional ML infrastructure engineers, cloud platform engineers, and MLOps specialists who can execute at the highest level

  • Develop and mentor: Coach engineers to grow technically and professionally, fostering a culture of deep work, technical excellence, and customer obsession

  • Scale the organization: Grow the team from founding engineers to a robust platform organization as we scale from early customers to enterprise deployments

  • Set technical standards: Establish engineering practices, code review processes, and quality bars that enable the team to ship fast without breaking things

  • Foster collaboration: Work closely with deep learning researchers, product engineers, CFD domain experts, and customer success to ensure platform capabilities align with company needs

Execution & Delivery
  • Ship relentlessly: Drive the team to deliver infrastructure from prototype to production in weeks, not quarters, iterating based on real customer feedback

  • Own reliability: Take responsibility for platform uptime, performance, and customer success—when things break, you're in the arena fixing them

  • Make strategic tradeoffs: Balance innovation with stability, speed with quality, and custom solutions with scalable platforms

  • Work with customers: Engage directly with automotive and maritime customers to understand their infrastructure requirements, security constraints, and deployment challenges

  • Build for enterprise: Implement security, compliance, monitoring, and operational practices that meet the standards of Fortune 500 companies

QualificationsRequired Experience
  • 8+ years in ML infrastructure or cloud platform engineering, with at least 3 years in technical leadership roles managing high-performing teams

  • Proven track record building and scaling ML platforms for training, serving, or deploying models in production environments, ideally at AI-first companies

  • Deep technical expertise in distributed training (PyTorch Distributed, DeepSpeed, Ray), cloud infrastructure (AWS/GCP/Azure), and container orchestration (Kubernetes, Docker)

  • Hands-on coding ability: Expert-level Python and infrastructure-as-code skills—you can still ship production code yourself and review your team's work deeply

  • Team building success: Track record of recruiting, developing, and retaining exceptional engineering talent, with experience building teams from 3-4 engineers to 15-20+

  • Strong product and customer intuition: Experience working closely with customers, understanding their workflows, and translating requirements into technical solutions

  • Outstanding execution velocity: Proven ability to ship infrastructure rapidly in fast-paced, high-growth environments while maintaining quality

Technical Requirements
  • ML infrastructure mastery: Deep understanding of training pipelines, model serving, distributed systems, GPU optimization, and the full ML lifecycle

  • Cloud platform expertise: Strong experience with cloud providers, infrastructure-as-code tools, and building hybrid cloud/on-premise solutions

  • System design excellence: Can architect complex, scalable systems and make smart tradeoff decisions under uncertainty

  • Performance optimization: Knowledge of GPU programming, model optimization techniques, and infrastructure cost management

  • Enterprise infrastructure: Experience with security, compliance, SSO, RBAC, and deploying into regulated or air-gapped environments

Leadership & Communication
  • Technical credibility: Earns respect through deep technical contribution, not just title or tenure

  • Clear communicator: Can explain complex technical decisions to customers, executives, researchers, and engineers at all levels

  • Strategic thinker: Balances short-term execution with long-term platform vision and architectural decisions

  • Player-coach mentality: Comfortable coding and debugging yourself while also managing, mentoring, and growing a team

  • High agency: Takes ownership of outcomes, doesn't wait for permission, and drives solutions to completion

Bonus Qualifications
  • Experience in industrial or scientific ML: Built infrastructure for physics simulation, computational chemistry, drug discovery, or other scientific computing domains

  • CAE/HPC background: Familiarity with simulation software, job schedulers (SLURM, PBS), parallel file systems, or high-performance computing environments

  • Founded or led platform teams at AI startups (Seed to Series B) through rapid growth and scaling challenges

  • Published or presented on ML infrastructure, distributed training, or MLOps topics at major conferences or venues

  • Experience with foundation models: Built infrastructure for training or serving large-scale pretrained models (LLMs, vision models, multimodal models)

  • Open-source contributions to major ML infrastructure projects (PyTorch, Ray, Kubernetes, MLflow, etc.)

  • PhD or MS in Computer Science, ML, or related field (or equivalent industry experience)

  • Enterprise B2B experience: Sold to or deployed infrastructure for Fortune 500 customers with complex security and compliance requirements

Cultural Fit
  • Technical Respect: Ability to earn respect through hands-on technical contribution, not just management authority

  • Intensity: Thrives in our unusually intense culture—willing to grind when needed and expects the same from your team

  • Customer Obsession: Passionate about solving real customer problems and building infrastructure that enables their success

  • Deep Work: Values long, uninterrupted periods of focused work and fosters this culture in your team

  • High Availability: Ready to be deeply involved whenever critical issues arise, whether that's at 2 AM or on weekends

  • Communication: Can translate complex technical concepts to diverse audiences and bridge engineering, research, and business

  • Growth Mindset: Embraces continuous learning and develops this mindset in your team

  • Startup Mindset: Comfortable with ambiguity, rapid change, and wearing multiple hats—you're a builder first, manager second

  • Work Ethic: Willing to put in the extra hours when needed to hit critical milestones and holds your team to high standards

  • Low Ego, High Accountability: Collaborative leadership style with focus on outcomes over personal credit

What We Offer
  • Build the foundation: Shape the ML platform strategy for a rapidly growing foundational AI company from the ground up

  • Real-world impact: See your infrastructure power physics simulations that optimize automotive aerodynamics, maritime vessel design, and other critical engineering applications

  • Direct CEO collaboration: Work closely with the founder & CEO, influence company strategy, and have your voice heard on major decisions

  • Exceptional team: Recruit and work with world-class deep learning researchers, CFD experts, and infrastructure engineers

  • Competitive compensation: Base salary + significant equity upside as a founding leadership hire

  • In-person culture: 5 days a week in office with a team that values face-to-face collaboration, deep technical discussions, and building together

  • World-class network: Access to our investors and advisors including Eric Schmidt, Elad Gil, Ion Stoica, David Patterson, and others

Benefits
  • Competitive compensation and equity

  • Competitive health, dental, vision benefits paid by the company

  • 401(k) plan offering

  • Flexible vacation

  • Team Building & Fun Activities

  • Great scope, ownership and impact

  • AI tools stipend

  • Monthly commute stipend

  • Monthly wellness / fitness stipend

  • Daily office lunch & dinner covered by the company

  • Immigration support

How We're Different

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; who errs, who comes short again and again... who at the best knows in the end the triumph of high achievement, and who at the worst, if he fails, at least fails while daring greatly." - Teddy Roosevelt

At our core, we believe in being "in the arena." We are builders, problem solvers, and risk-takers who show up every day ready to put in the work: to sweat, to struggle, and to push past our limits. We know that real progress comes with missteps, iteration, and resilience. We embrace that journey fully knowing that daring greatly is the only way to create something truly meaningful.

If you're ready to build the ML platform that will revolutionize physics simulation, lead a world-class team, and deliver transformative impact to industrial engineering, UniversalAGI is the place for you.

Top Skills

Ansys
AWS
Azure
Deepspeed
Docker
GCP
Kubernetes
Openfoam
Pbs
Python
PyTorch
Ray
Slurm
Star-Ccm+
HQ

UniversalAGI San Francisco, California, USA Office

San Francisco, California, United States, 94107

Similar Jobs

53 Minutes Ago
In-Office or Remote
2 Locations
130K-280K Annually
Senior level
130K-280K Annually
Senior level
Artificial Intelligence • Software • Automation
The Account Executive will identify potential customers, develop outbound strategies, manage sales processes, and collaborate cross-functionally while building the sales team.
Top Skills: SaaS
An Hour Ago
Hybrid
Los Angeles, CA, USA
23-31 Hourly
Entry level
23-31 Hourly
Entry level
Fintech • Financial Services
As an Associate Personal Banker, you'll build customer relationships, assist with account openings, and provide product solutions while following bank policies.
An Hour Ago
In-Office
San Francisco, CA, USA
99K-172K Annually
Senior level
99K-172K Annually
Senior level
Artificial Intelligence • Cloud • Consumer Web • eCommerce • Information Technology • Software
As a Senior Infrastructure Engineer, you'll design systems for web operations, apply SRE principles, and automate infrastructure using various programming languages and tools.
Top Skills: Amazon EcsAnsibleChefDockerElkGoKubernetesLightstepNew RelicNomadPrometheusPuppetPythonRubyScalaSentryTerraform

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account