Realm Labs

Software Engineer, ML Infrastructure

Posted 7 Days Ago

In-Office

Sunnyvale, CA, USA

180K-250K Annually

Senior level

In-Office

Sunnyvale, CA, USA

180K-250K Annually

Senior level

The role involves deploying and optimizing LLMs, designing the ML serving stack, and ensuring high-performance GPU services for production readiness.

The summary above was generated by AI

Role Overview

We are hiring a Founding ML Infrastructure Engineer to own the end-to-end deployment, optimization, and operation of our suits of models in production.

This is a core founding role focused on building and operating production-grade LLM systems. You will apply deep knowledge of model internals to deploy, optimize, and run modern LLMs at scale, owning performance end-to-end across latency, throughput, and reliability.

You will design and operate the full ML serving stack from model artifacts to GPU execution, and work closely with Product and ML teams to ensure our models can support high QPS, strict SLAs, and production correctness.

This role is ideal for someone who deeply understands how LLMs work internally, but chooses to specialize in making them fast, stable, and production-ready.

About Realm Labs

Realm Labs is an AI trust and security startup. We help enterprises detect, debug, and prevent AI’s misbehaviors in production. We are backed by top VCs and serve some of the most iconic global enterprises.

Key Responsibilities

Own the end-to-end LLM inference stack, including:
- Model loading and execution
- GPU utilization and memory efficiency
- Runtime performance tuning
- Production deployment and scaling
Design and operate high-performance LLM serving systems using technologies such as:
- vLLM, TensorRT / TensorRT-LLM, Triton Inference Server, SGLang
Optimize inference across:
- Latency
- Throughput (QPS)
- GPU memory footprint
- Cost efficiency
Work hands-on with PyTorch and TensorFlow models, including:
- Model graph understanding
- Attention mechanisms, KV cache behavior, batching strategies
- Precision tradeoffs (FP16, BF16, INT8, etc.)
Build and maintain production-grade GPU services:
- Multi-model serving
- Autoscaling strategies
- Fault isolation and graceful degradation
Collaborate with application and platform teams to:
- Define serving APIs
- Ensure correctness and safety of outputs
- Debug production issues end-to-end
Build a reproducible model training and versioning system for customer deployments
Establish best practices for:
- Model versioning
- Rollouts and rollbacks
- Performance benchmarking
- Production validation

Expected Qualifications

5+ years of professional experience in ML infrastructure, systems engineering, or production ML roles.
Strong software engineering fundamentals; ability to write robust, maintainable production code.
Deep hands-on experience with LLM inference infrastructure, including:
- PyTorch (required)
- TensorFlow (working knowledge)
Proven experience with GPU inference optimization, including:
- TensorRT / TensorRT-LLM
- vLLM
- Triton Inference Server
- SGLang or similar serving runtimes
Strong understanding of LLM internals, such as:
- Transformer architectures
- Attention and KV caching
- Batching, streaming, and token-level generation
Experience running ML systems in production with high traffic and SLAs
Comfortable working in Linux-based, cloud production environments

Preferred Qualifications

Experience deploying LLMs on Kubernetes and GPU clusters.
Familiarity with CUDA, NCCL, or low-level GPU performance concepts.
Experience with:
- Model sharding and parallelism strategies
- Multi-GPU inference
- Streaming inference systems
Knowledge of observability for ML systems (metrics, latency breakdowns, GPU monitoring).
Experience working at startups or owning systems with minimal abstraction layers.

Additional Information

This is a founding, high-ownership role with direct impact on core product capabilities.
You will be expected to build, run, and own systems end-to-end.
The role may include limited on-call responsibilities aligned with production ownership.

Compensation & Benefits

Market aligned compensation and benefits
Founding engineer equity (Equity is a significant component of this role and will be discussed)
Medical, Dental, Vision, Life insurance, 401-K, In-office lunch etc.

Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and candidate. But if we make you an offer, we will make all reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.

Compensation

The base pay range for this role is $180,000 – $250,000 per year.

Similar Jobs

Specter

Software Engineer

3 Days Ago

In-Office

San Francisco, CA, USA

Mid level

Artificial Intelligence • Big Data • Information Technology • Software • Analytics

The role involves designing ML pipelines for computer vision, optimizing models for edge deployment, and developing data management systems for sensor datasets.

Top Skills: C++DeepspeedMlflowOnnxOnnx RuntimeOpenvinoPythonPyTorchPytorch DdpRayRustTensorFlowTensorrtWeights & Biases

Voxel

Staff Software Engineer

20 Days Ago

Hybrid

San Francisco, CA, USA

200K-250K Annually

Senior level

200K-250K Annually

Senior level

Artificial Intelligence • Security • Software

The role involves managing ML infrastructure, building scalable data pipelines, operating training frameworks, leading projects, and ensuring best DevOps practices for machine learning.

Top Skills: ArgocdDaskGcsGithub ActionsGrafanaKubernetesPrometheusPythonRayS3SparkTensorrtTerraform

Arena (arena.ai)

Senior Software Engineer

17 Hours Ago

Remote or Hybrid

Senior level

Artificial Intelligence • Information Technology • Software

Design and develop scalable, high-performance data and API infrastructure for real-time processing. Mentor engineers and collaborate with teams to enhance AI model evaluations.

Top Skills: APIsDistributed SystemsLow-Latency PipelinesPyTorchScalable Backend ArchitectureStream Processing

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine