Luma AI

Software Engineer, Inference

Reposted 24 Days Ago

Remote or Hybrid

7 Locations

188K-395K Annually

Mid level

Remote or Hybrid

7 Locations

188K-395K Annually

Mid level

The ML Engineer will integrate model architectures, optimize deployment workflows, maintain CI/CD pipelines, and ensure reliability of inference services across large-scale systems.

The summary above was generated by AI

About Luma AI

Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step function change will come from vision. So we are working on training and scaling up multimodal foundation models for systems that can see and understand, show and explain, and eventually interact with our world to effect change.

Luma’s mission is to build multimodal AI to expand human imagination and capabilities.

We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step function change will come from vision. We are working on training and scaling up multimodal foundation models for systems that can see and understand, show and explain, and eventually interact with our world to affect change. We know we are not going to reach our goal with reliable & scalable infrastructure, which is going to become the differentiating factor between success and failure.

Role & Responsibilities

Ship new model architectures by integrating them into our inference engine
Collaborate closely across research, engineering and infrastructure to streamline and optimize model efficiency and deployments
Build internal tooling to measure, profile, and track the lifetime of inference jobs and workflows
Automate, test and maintain our inference services to ensure maximum uptime and reliability
Optimize deployment workflows to scale across thousands of machines
Manage and optimize our inference workloads across different clusters & hardware providers
Build sophisticated scheduling systems to optimally leverage our expensive GPU resources while meeting internal SLOs
Build and maintain CI/CD pipelines for processing/optimizing model checkpoints, platform components, and SDKs for internal teams to integrate into our products/internal tooling

Background

Strong Python and system architecture skills
Experience with model deployment using PyTorch, Huggingface, vLLM, SGLang, tensorRT-LLM, or similar
Experience with queues, scheduling, traffic-control, fleet management at scale
Experience with Linux, Docker, and Kubernetes
Bonus points:
- Experience with modern networking stacks, including RDMA (RoCE, Infiniband, NVLink)
- Experience with high performance large scale ML systems (>100 GPUs)
- Experience with FFmpeg and multimedia processing

Example Projects

Create a resilient artifact store that manages all checkpoints across multiple versions of multiple models
Enable hotswapping of models for our GPU workers based on live traffic patterns
Build a robust queueing system for our jobs that take into account cluster availability and user priority
Architect a e2e model serving deployment pipeline for a custom vendor
Integrate our inference stack into an online reinforcement learning pipeline
Regression & precision testing across different hardware platforms
Building a full tracing system to trace the end-to-end lifetime of any inference workload

Tech stackMust have

Python
Redis
S3-compatible Storage
Model serving (one of: PyTorch, vLLM, SGLang, Huggingface)
Understanding of large-scale orchestration, deployment, scheduling (via Kubernetes or similar)

Nice to have

CUDA
FFmpeg

Compensation

The base pay range for this role is $187,500 – $395,000 per year.

About Luma

Luma’s mission is to build unified general intelligence that can generate, understand, and operate in the physical world.

We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step function change will come from vision. So, we are working on training and scaling up multimodal foundation models for systems that can see and understand, show and explain, and eventually interact with our world to effect change.

Top Skills

Cuda

Ffmpeg

Huggingface

Kubernetes

Python

PyTorch

Redis

S3-Compatible Storage

Sglang

Vllm

San Francisco, CA, United States

Similar Jobs

Square

Supplier Manager, Global Import Compliance

7 Minutes Ago

Remote or Hybrid

149K-270K Annually

Senior level

149K-270K Annually

Senior level

eCommerce • Fintech • Hardware • Payments • Software • Financial Services

Manage global import compliance, customs clearance, and strategic sourcing initiatives across Block's hardware portfolio, ensuring compliance and operational excellence.

Top Skills: DescartesGlobal Trade Management SystemsOracleSAP

ServiceNow

Area VP, Solution Consulting - Canada

3 Hours Ago

Remote or Hybrid

Senior level

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation

Lead the AI-native Solution Consulting vision in Canada, developing teams, driving customer engagement on AI platforms, and ensuring operational excellence.

Top Skills: AIAi AgentsAi Control TowerGenerative AiSalesforceServicenow

Luxury Presence

Staff Software Engineer

7 Hours Ago

Easy Apply

Remote or Hybrid

Easy Apply

Senior level

Marketing Tech • Real Estate • Software • PropTech • SEO

As a Staff Software Engineer, you'll develop and improve the AI platform for real estate, working on APIs, services, and collaborations to enhance user experiences.

Top Skills: AWSDynamoDBElasticsearchJavaScriptKafkaKubernetesLambdaNode.jsPostgresReactSqsTypescript

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Luma AI

Software Engineer, Inference

Top Skills

Luma AI San Francisco, California, USA Office

Similar Jobs

Supplier Manager, Global Import Compliance

Area VP, Solution Consulting - Canada

Staff Software Engineer

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech