WEKA Logo

WEKA

Tech Lead - AI Inference

Posted 2 Days Ago
Remote
Hiring Remotely in U.S.
Senior level
Remote
Hiring Remotely in U.S.
Senior level
Lead and own end-to-end LLM inference infrastructure, direct a small engineering squad, design and deliver high-throughput low-latency serving systems, remain hands-on (Python/C++/CUDA), solve inference-scale problems (KV caching, RDMA, NVMe, multi-tier GPU memory), and mentor engineers.
The summary above was generated by AI

We are seeking a Tech Lead to lead our AI Inference team. In this role, you will bridge the gap between complex research and production-grade engineering, while cultivating a high-performing team culture. You will lead and grow a squad of 3 developers, balancing hands-on technical contribution with strong people leadership — setting direction, unblocking your team, and driving execution on high-performance systems that optimize Large Language Model (LLM) serving.

The ideal candidate combines deep technical expertise in inference and scale with the leadership maturity to mentor, motivate, and develop engineers in the evolving ecosystem of serving frameworks like vLLM and LMCache.

What You'll Work On

  • Lead & Own: Take end-to-end ownership of AMG's core inference infrastructure — from the NVMe Token Warehouse and GDS data paths to the vLLM/LMCache serving stack — driving technical decisions and delivery outcomes.
  • Technical Direction: Guide a team of engineers through design, implementation, and delivery of high-throughput, low-latency LLM inference systems, setting high standards for code quality, architecture, and reliability.
  • Build at Scale: Stay hands-on across the AMG stack (Python, C++, CUDA, vLLM, NIXL/Dynamo, Kubernetes), contributing directly to production systems while providing technical leadership to the team.
  • Solve Hard Problems: Tackle the real frontier challenges of inference engineering — disaggregated prefill/decode, persistent off-HBM KV caching, RDMA-based transport, and multi-tier GPU memory hierarchies — that define what's possible at scale.
  • Grow People & Teams: Mentor and coach engineers through regular 1:1s, career coaching, and sprint reviews. Foster a culture of ownership, collaboration, and technical excellence within the AMG team.
  • Stay on the Frontier: Track the evolving inference ecosystem, benchmark new tools (SGLang, TRT-LLM, NVIDIA Dynamo), and help the team make timely decisions about when to adopt, build, or pivot.

What We're Looking For

  • Experienced Engineering Leader: 5+ years of professional software engineering, with proven experience leading engineers and owning complex production systems — ideally in AI/ML infrastructure or high-performance computing.
  • Deep AI Inference Background: Hands-on expertise with LLM serving systems — KV cache reuse, disaggregated prefill/decode, continuous batching, and multi-tier GPU memory hierarchies (HBM → NVMe). Strong familiarity with vLLM, LMCache, NIXL/NVIDIA Dynamo, or similar frameworks.
  • Systems Engineering Depth: Strong Python and C++ skills (Rust a plus), with a solid grasp of CUDA, GPU memory management, and high-performance I/O — including GPUDirect Storage (GDS), RDMA, and NVMe data paths.
  • Infrastructure Fluency: Experience deploying and scaling GPU workloads on Kubernetes, with familiarity in RDMA networking, bare-metal GPU clusters (H100/A100), and high-throughput distributed storage.
  • People Leadership: Demonstrated ability to mentor and develop engineers — running effective 1:1s, supporting career growth, and balancing technical execution with long-term team health.

High Bar for Quality: A strong sense of engineering craftsmanship, with a track record of building reliable, high-throughput systems and continuously improving engineering practices.

 The WEKA Way:

  • We are Accountable: We take full ownership, always–even when things don’t go as planned. We lead with integrity, show up with responsibility & ownership, and hold ourselves and each other to the highest standards.
  • We are Brave: We question the status quo, push boundaries, and take smart risks when needed. We welcome challenges and embrace debates as opportunities for growth, turning courage into fuel for innovation.
  • We are Collaborative: True collaboration isn’t only about working together. It’s about lifting one another up to succeed collectively. We are team-oriented and communicate with empathy and respect. We challenge each other and conduct positive conflict resolution. We are being transparent about our goals and results. And together, we’re unstoppable.
  • We are Customer Centric: Our customers are at the heart of everything we do. We actively listen and prioritize the success of our customers, and every decision we make is driven by how we can better serve, support, and empower them to succeed. When our customers win, we win.

USA Residents Only: The Total Compensation hiring wage range for this position which the Company reasonably and in good faith expects to pay for the position in the specified geographic areas or locations. Final compensation will be dependent on various factors relevant to the position and candidate such as geographical location, candidate qualifications, certifications, relevant job-related work experience, education, skillset and other relevant business and organizational factors, consistent with applicable law. In addition, the position may include some of the following comprehensive benefits such Medical, Dental, Vision, Life, 401(K), Flexible Time off (FTO), sick time, leave of absence as per the FMLA and other relevant leave laws.

Concerned that you don’t meet every qualification above?

Studies have shown that women and people of color may be less likely to apply for jobs if they don’t meet every qualification specified. At WEKA, we are committed to building a diverse, inclusive and authentic workplace. If you are excited about this position but are concerned that your past work experience doesn’t match up perfectly with the job description, we encourage you to apply anyway – you may be just the right candidate for this or other roles at WEKA.

 WEKA is an equal opportunity employer that prohibits discrimination and harassment of any kind. We provide equal opportunities to all employees and applicants for employment without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation and training.

HQ

WEKA Campbell, California, USA Office

910 East Hamilton Ave, SUITE 430,, Campbell, CA, United States, 95008

Similar Jobs

An Hour Ago
Remote or Hybrid
Senior level
Senior level
Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Big Data Analytics • Automation
Lead Dynatrace's global Customer Education, defining AI-driven learning and enablement strategy to accelerate adoption, retention, and growth. Build enterprise adoption programs, oversee Dynatrace University (digital, instructor-led, certifications), embed enablement across customer lifecycle, track KPIs linking education to business outcomes, and develop a high-performing global team while influencing senior stakeholders.
Top Skills: AIAnalyticsCloud PlatformsDigital Adoption PlatformsLearning TechnologiesObservabilityProduct Usage Data
An Hour Ago
Remote or Hybrid
California, USA
141K-229K Annually
Senior level
141K-229K Annually
Senior level
Consumer Web • eCommerce • Machine Learning • Software • Sports • Analytics
Design, build, and operate scalable AWS-based backend services and APIs for the Collectors Vault. Own architecture and delivery, improve performance and reliability, mentor engineers, and leverage modern AI tools to accelerate development and engineering velocity.
Top Skills: APIsAWSC#Claude Code CliCodexEvent-Driven ArchitecturesJavaServerless
An Hour Ago
Remote or Hybrid
US
141K-229K Annually
Senior level
141K-229K Annually
Senior level
Consumer Web • eCommerce • Machine Learning • Software • Sports • Analytics
Lead backend and full-stack work on the Payments team, building multi-gateway integrations (Stripe, PayPal), payment APIs, and customer payment UIs. Ensure secure, compliant (PCI-DSS) payment flows, reliability, observability, and scalability across AWS/Kubernetes microservices. Partner cross-functionally to design architecture, implement settlement/reconciliation, and maintain high availability.
Top Skills: .NetAi-Assisted Development ToolsAWSC#DatadogDynamoDBKafkaKubernetesPaypalPci-DssPostgresReactStripeSvelteTypescript

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account