Kumo Logo

Kumo

Software Engineer - Cloud Infrastructure

Reposted 19 Days Ago
Hybrid
Mountain View, CA
145K-215K Annually
Mid level
Hybrid
Mountain View, CA
145K-215K Annually
Mid level
As a Cloud Infrastructure Engineer, you will manage and optimize Kubernetes clusters across multiple cloud platforms while enhancing automation and reliability for AI applications.
The summary above was generated by AI
About Kumo.ai

Kumo is building the infrastructure layer for the next generation of enterprise AI — a platform that lets organizations turn their data into predictive intelligence instantly, without the heavy lifting of traditional ML pipelines. We have also built our own Relational Foundation Model that can provide predictions in seconds – no training, straight to business value!

Join a dynamic, rapidly expanding team of innovators from top-tier companies like Airbnb, LinkedIn, Pinterest, and Stanford, supported by the renowned Sequoia Capital. We're on the front lines of AI, solving some of its most challenging and impactful problems, and we've already delivered over $500M+ in tangible value to industry giants like Reddit, DoorDash, and Databricks. If you thrive in a fast-paced environment, are driven by ambitious goals, and crave an opportunity for massive impact, this is your chance to shape the future of AI.

The Opportunity

Kumo’s platform runs thousands of predictive workloads across multi-tenant Kubernetes clusters that form the backbone of our AI stack. As an Cloud Infrastructure Engineer you’ll own, scale, and optimize that platform — from real-time inference to large-scale training — with real production impact. You’ll make high-leverage architectural decisions, ship quickly, and collaborate across ML, product, and engineering teams to expand our multi-cloud capabilities. Expect to move fast, iterate often, and see your changes land in production within days — not quarters.

What You’ll Do

  • Design, build, and evolve Kumo’s multi-tenant infrastructure to support massive AI and data workloads across AWS, Azure, and GCP.
  • Implement and maintain infrastructure-as-code to automate training and deployment pipelines across many environments.
  • Operate and scale Kubernetes clusters with a focus on reliability, performance, availability, tenant isolation, and cost efficiency.
  • Build observability and alerting into distributed systems using Prometheus, Grafana, OpenTelemetry, and related tooling.
  • Partner closely with ML researchers and product teams to deliver production-grade infrastructure for advanced AI workloads.
  • Drive security and operational best-practices (RBAC, tenant isolation, cloud identity, etc.) across our platform.

What You Bring

  • 3–5 years building or operating cloud-native infrastructure in production.
  • Hands-on experience with at least one major cloud (AWS / Azure / GCP); multi-cloud exposure is a plus.
  • Operational experience with Kubernetes and production-grade clusters.
  • Proficiency with Infrastructure-as-Code (Terraform, Pulumi, etc.) and familiarity with GitOps tooling (ArgoCD, Flux, Argo Workflows).
  • Strong debugging, systems-thinking, and communication skills — you can drive technical decisions and explain trade-offs to multiple stakeholders.

Nice to Have

  • Experience operating multi-tenant Kubernetes for data / AI workloads.
  • Experience with (managed) Spark or large-scale data processing systems.
  • Familiarity with Kubernetes operators, controllers, and custom resources.
  • Deep experience with monitoring/tracing/logging stacks (Prometheus, OpenTelemetry, etc.)

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Top Skills

Argo Cd
AWS
Azure
Bash
Crossplane
Flux
GCP
Go
Grafana
Kubernetes
Opentelemetry
Prometheus
Pulumi
Python
Terraform
HQ

Kumo Mountain View, California, USA Office

357 Castro St, Suite 200, Mountain View, CA, United States, 94041

Similar Jobs

2 Days Ago
Hybrid
4 Locations
Senior level
Senior level
Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
The Senior Software Engineer will design, automate, and maintain cloud platforms for CI/CD processes, collaborating across teams to meet technical requirements.
Top Skills: AWSAzureCC++DockerGithub ActionsJavaJenkinsKubernetesPythonQuarkusSystemc
13 Days Ago
Easy Apply
Hybrid
2 Locations
Easy Apply
170K-180K Annually
Senior level
170K-180K Annually
Senior level
Artificial Intelligence • Machine Learning • Mobile • Other • Social Impact • Software • App development
The Cloud Infrastructure Engineer will maintain AWS EKS clusters, monitoring systems, and collaborate on architecture and reliability processes while supporting team development.
Top Skills: AlloyAnsibleAWSBuildkiteCi/CdCloudFormationDockerEksElbEnvoyGoGrafanaJenkinsKubernetesMimirPrometheusPythonRoute53/DnsSpinnakerTerraformTerragruntThanosVpc
21 Days Ago
In-Office or Remote
Los Angeles, CA, USA
145K-195K Annually
Senior level
145K-195K Annually
Senior level
Hardware • Software • Design
The Senior Software Engineer will design and maintain reliable core systems, automate processes, improve monitoring, and support both backend and frontend code as needed.
Top Skills: Containerized DeploymentsDatabasesInfrastructure-As-CodeJavaScriptReactRustStripe

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account