Decagon

Senior Software Engineer, ML Infrastructure

Reposted 9 Days Ago

Be an Early Applicant

In-Office

San Francisco, CA, USA

250K-330K Annually

Senior level

In-Office

San Francisco, CA, USA

250K-330K Annually

Senior level

The role involves designing ML infrastructure, building distributed training systems, integrating algorithms, and ensuring reliable inference architecture. Responsibilities include mentoring, driving technical direction, and managing projects effectively.

The summary above was generated by AI

About Decagon

Decagon is the leading conversational AI platform empowering every brand to deliver concierge customer experiences.

Our technology enables industry-defining enterprises like Avis Budget Group, Block’s Cash App and Square, Chime, Oura Health, and Hunter Douglas to deploy AI agents that power personalized, deeply satisfying interactions across voice, chat, email, SMS, and every other channel.

We’re building a future where customer experiences are being redefined from support tickets and hold music to faster resolutions, richer conversations, and deeper relationships. We’re proud to be backed by world-class investors who share that vision, including a16z, Accel, Bain Capital Ventures, Coatue, and Index Ventures, along with many others.

We’re an in-office company, driven by a shared commitment to excellence and velocity. Our values — Just Get It Done, Invent What Customers Want, Winner’s Mindset, and The Polymath Principle — shape how we work and grow as a team.

About the Team

The ML Infrastructure team builds the systems that power every stage of Decagon's model lifecycle. We own the platforms for model training, the infrastructure for model evaluation and experimentation, and the routing layer that manages inference across multiple providers.

We work at the intersection of research and production: translating cutting-edge ML models into reliable, scalable systems that run in customer environments. We collaborate closely with Research, Infrastructure, and Product teams to ensure models train efficiently, serve reliably, and deliver exceptional user experiences.

The team values technical rigor, pragmatic decision-making, and building systems that others love to use.

About the Role

We're hiring a Senior ML Infrastructure Engineer to own the platforms powering Decagon's model training and inference. You'll build distributed training systems, design inference architecture across multiple providers, and create the frameworks that let our Research and Product teams ship faster.

This role is for someone who thrives on technical depth, can lead multi-quarter initiatives, and wants to shape the long-term architecture of our ML stack.

In this role, you will

Design and build distributed training platforms for LLM and multimodal fine-tuning and post-training at scale
Integrate state-of-the-art training algorithms into production pipelines
Own inference architecture and multi-provider routing, including failover and optimization
Lead initiatives to improve latency and cost efficiency across the training and serving stack
Build evaluation and experimentation infrastructure that enables rapid, reliable iteration
Drive technical direction, mentor engineers, and establish best practices for ML infrastructure

Your background looks something like this

6+ years building ML infrastructure or production systems at scale
Deep experience with distributed training: multi-node GPU clusters, fault tolerance, and optimization
Strong understanding of LLM inference: latency optimization, provider tradeoffs, and serving architecture
Proven track record leading complex, multi-quarter technical projects

Compensation

$200K – $400K + Offers Equity
This range reflects the expected compensation for this role. Compensation within the range is determined based on experience, skills, and the scope of responsibilities, with flexibility for candidates who demonstrate exceptional impact.
In addition to base salary, we offer competitive equity. Final compensation may vary based on location within the United States.

Benefits

We proudly offer the following benefits for our full-time employees:

Take what you need vacation policy (subject to local requirements; UK employees receive 25 days of statutory leave)
Medical, Dental, and Vision benefits for you and your family
Life Insurance and Disability Benefits
Retirement Plan (e.g., 401K, pension)
Parental Leave
Fertility and family building benefits through Carrot
Daily lunches and snacks in the office to keep you at your best

These benefits are described in more detail in Decagon’s policies, may vary by location, and can change at any time according to applicable compensation and benefits plans.

2261 Market St, 5378, San Francisco, California, United States, 94114

Similar Jobs

Voxel

Staff Software Engineer

17 Days Ago

Hybrid

San Francisco, CA, USA

200K-250K Annually

Senior level

200K-250K Annually

Senior level

Artificial Intelligence • Security • Software

The role involves managing ML infrastructure, building scalable data pipelines, operating training frameworks, leading projects, and ensuring best DevOps practices for machine learning.

Top Skills: ArgocdDaskGcsGithub ActionsGrafanaKubernetesPrometheusPythonRayS3SparkTensorrtTerraform

Arena (arena.ai)

Senior Software Engineer

23 Days Ago

Remote or Hybrid

Senior level

Artificial Intelligence • Information Technology • Software

Design and develop scalable, high-performance data and API infrastructure for real-time processing. Mentor engineers and collaborate with teams to enhance AI model evaluations.

Top Skills: APIsDistributed SystemsLow-Latency PipelinesPyTorchScalable Backend ArchitectureStream Processing

Nuro

Senior Software Engineer

24 Days Ago

In-Office

Mountain View, CA, USA

194K-291K Annually

Senior level

194K-291K Annually

Senior level

Artificial Intelligence • Automotive • Information Technology • Robotics

The Software Engineer will develop ML infrastructure, enhance resource provisioning, optimize workload scheduling, and manage data processing for Nuro's autonomous vehicle technology.

Top Skills: Apache BeamSparkCrossplaneFeastHopsworksInfrastructure As CodeKuberayKubernetesPulumiRayRedisSlurmTerraformVolcano

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Decagon

Senior Software Engineer, ML Infrastructure

Decagon San Francisco, California, USA Office

Similar Jobs

Staff Software Engineer

Senior Software Engineer

Senior Software Engineer

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech