UniversalAGI

Founding ML Cloud Infrastructure Engineer

Reposted 7 Days Ago

In-Office

San Francisco, CA

Mid level

In-Office

San Francisco, CA

Mid level

As a Founding ML Infrastructure Engineer, you'll build ML infrastructure for training and deployment, ensuring optimal performance and compliance, directly impacting AI for physics.

The summary above was generated by AI

📍 San Francisco | Work Directly with CEO & founding team | Report to CEO | OpenAI for Physics | 🏢 5 Days Onsite

Founding ML Cloud Infrastructure Engineer

Location: Onsite in San Francisco

Compensation: Competitive Salary + Equity

Who We Are

UniversalAGI is building OpenAI for Physics. AI startup based in San Francisco and backed by Elad Gil (#1 Solo VC), Eric Schmidt (former Google CEO), Prith Banerjee (ANSYS CTO), Ion Stoica (Databricks Founder), Jared Kushner (former Senior Advisor to the President), David Patterson (Turing Award Winner), and Luis Videgaray (former Foreign and Finance Minister of Mexico). We’re building foundation AI models for physics that enable end-to-end industrial automation from initial design through optimization, validation, and production.

We're building a high-velocity team of relentless researchers and engineers that will define the next generation of AI for industrial engineering. If you're passionate about AI, physics, or the future of industrial innovation, we want to hear from you.

About the Role

As a founding ML Cloud Infrastructure Engineer, you'll be in the arena from day one, building the backbone that powers AI for physics at scale. This is your chance to build and own the entire ML infrastructure stack, from finetuning, training pipelines to low-latency customer deployments that serve foundation models in production.

You'll work directly with the CEO and founding team to build infrastructure that can train on petabytes of simulation data, serve physics models with strict accuracy requirements, and deploy seamlessly into customer environments with enterprise security and compliance needs. You're coming up with new paradigms for how AI models integrate into industrial engineering workflows.

What You'll Do

Build and scale fine tuning & training infrastructure for foundation models, distributed training across multi-GPU and multi-node clusters, optimizing for throughput, cost, and iteration speed
Design and implement model serving systems with low latency, high reliability, and the ability to handle complex physics workloads in production
Build fine-tuning pipelines that let customers adapt our foundation models to their specific use cases, data, and workflows without compromising model quality or security
Build deployment serving infrastructure for on-premise and cloud environments, working through customer security requirements and compliance constraints
Create robust data pipelines that can ingest, validate, and preprocess massive CFD datasets from diverse sources and formats
Instrument everything: Build observability, monitoring, and debugging tools that give our team and customers full visibility into model performance, data quality, and system health
Work directly with customers on deployment, integration, and scaling challenges, turning their infrastructure pain points into product improvements
Move fast and ship: Take infrastructure from prototype to production in weeks, iterating based on real customer needs and research team feedback

This is a role for someone who's built ML systems that actually work in production, who understands both the research side and the operational reality, and is ready to solve some of humanity's hardest infrastructure problems.

Qualifications

3+ years of hands-on experience building and scaling ML infrastructure for fine tuning, training, serving, or deployment
Deep experience with cloud platforms (AWS, GCP, Azure) and infrastructure-as-code (Terraform, Kubernetes, Docker)
Deep expertise in distributed training frameworks (PyTorch Distributed, DeepSpeed, Ray, etc.) and multi-GPU/multi-node orchestration
Strong foundation in ML serving: Experience building low-latency inference systems, model optimization, and production deployment
Expert-level coding skills in Python and infrastructure tools, comfortable diving deep into ML frameworks and optimizing performance
Understanding of ML workflows: Training pipelines, experiment tracking, model versioning, and the full lifecycle from research to production
Strong communicator capable of bridging customers, engineers, and researchers, translating infrastructure constraints into product decisions
Outstanding execution velocity: Ships fast, debugs quickly, and thrives in ambiguity
Exceptional problem-solving ability: Willing to dive deep into unfamiliar systems and figure out what's actually broken
Comfortable in high-intensity startup environments with evolving priorities and tight deadlines

Bonus Qualifications

Computer Aided Engineer Software experience.
Experience deploying ML in enterprise environments with strict security, compliance, and air-gapped requirements
Built fine-tuning infrastructure for foundation models.
Experience with model optimization techniques
Deep understanding of GPU programming and performance optimization (CUDA, Triton, etc.)
Experience with large-scale data engineering for ML, ETL pipelines, and data validation systems
Built MLOps platforms or developer tools for ML teams
Experience at high-growth AI startups (Seed to Series C) or leading AI labs
Forward deployed experience working directly with customers on complex integrations
Open-source contributions to ML infrastructure or training frameworks

Cultural Fit

Technical Respect: Ability to earn respect through hands-on technical contribution
Intensity: Thrives in our unusually intense culture - willing to grind when needed
Customer Obsession: Passionate about solving real customer problems, not just cool tech
Deep Work: Values long, uninterrupted periods of focused work over meetings
High Availability: Ready to be deeply involved whenever critical issues arise
Communication: Can translate complex technical concepts to customers and team
Growth Mindset: Embraces the compounding returns of intelligence and continuous learning
Startup Mindset: Comfortable with ambiguity, rapid change, and wearing multiple hats
Work Ethic: Willing to put in the extra hours when needed to hit critical milestones
Team Player: Collaborative approach with low ego and high accountability

What We Offer

Opportunity to shape the technical foundation of a rapidly growing foundational AI company.
Work on cutting-edge industrial AI problems with immediate real-world impact.
Direct collaboration with the founder & CEO and ability to influence company strategy
Competitive compensation with significant equity upside.
In-person first culture - 5 days a week in office with a team that values face-to-face collaboration.
Access to world-class investors and advisors in the AI space.

Benefits

We provide great benefits, including:

Competitive compensation and equity.
Competitive health, dental, vision benefits paid by the company.
401(k) plan offering.
Flexible vacation.
Team Building & Fun Activities.
Great scope, ownership and impact.
AI tools stipend.
Monthly commute stipend.
Monthly wellness / fitness stipend.
Daily office lunch & dinner covered by the company.
Immigration support.

How We’re Different

“The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; who errs, who comes short again and again... who at the best knows in the end the triumph of high achievement, and who at the worst, if he fails, at least fails while daring greatly." - Teddy Roosevelt

At our core, we believe in being “in the arena.” We are builders, problem solvers, and risk-takers who show up every day ready to put in the work: to sweat, to struggle, and to push past our limits. We know that real progress comes with missteps, iteration, and resilience. We embrace that journey fully knowing that daring greatly is the only way to create something truly meaningful.

If you're ready to join the future of physics simulation, push creative boundaries, and deliver impact, UniversalAGI is the place for you.

Top Skills

AWS

Azure

Deepspeed

Docker

GCP

Kubernetes

Python

Pytorch Distributed

Ray

Terraform

San Francisco, California, United States, 94107

Similar Jobs

Cox Enterprises

Customer Relationship Manager II

A Minute Ago

Hybrid

Irvine, CA, USA

54K-81K Annually

Junior

54K-81K Annually

Junior

Automotive • Cloud • Greentech • Information Technology • Other • Software • Cybersecurity

The Customer Relationship Manager II builds client relationships, drives product utilization, and collaborates across teams to enhance customer satisfaction.

Top Skills: Crm SystemsMS Office

Cox Enterprises

Customer Relationship Manager II

2 Minutes Ago

Hybrid

Irvine, CA, USA

54K-81K Annually

Junior

54K-81K Annually

Junior

Automotive • Cloud • Greentech • Information Technology • Other • Software • Cybersecurity

The Customer Relationship Manager II will manage client relationships, ensure customer satisfaction, and facilitate communication within marketing platforms to enhance product utilization and retention.

Top Skills: Crm SystemsDigital MarketingMs Office Tools

Anduril

Corporate Counsel

An Hour Ago

In-Office

Costa Mesa, CA, USA

191K-253K Annually

Mid level

191K-253K Annually

Mid level

Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense

The M&A and Corporate Counsel will support legal aspects of mergers and acquisitions, financing transactions, and partnerships, while providing corporate support and managing vendor agreements.

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

UniversalAGI

Founding ML Cloud Infrastructure Engineer

Top Skills

UniversalAGI San Francisco, California, USA Office

Similar Jobs

Customer Relationship Manager II

Customer Relationship Manager II

Corporate Counsel

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech