focuskpi Logo

focuskpi

AI Infrastructure & Experience Engineer

Posted 4 Days Ago
In-Office
Mountain View, CA, USA
70-79 Hourly
Mid level
In-Office
Mountain View, CA, USA
70-79 Hourly
Mid level
Deploy, optimize, and integrate LLMs and multimodal models on local GPU/ARM64 hardware. Develop custom CUDA kernels, tune inference (TTFT, tokens/sec), connect backends to orchestration layers, build prototypes and frontends, and implement device communication protocols for local AI compute.
The summary above was generated by AI

FocusKPI is seeking an AI Infrastructure & Experience Engineer to join one of our clients, a high-tech SaaS company. 

Work Location: Mountain View, CA (Onsite role, 5 days/week onsite)
Duration: 4-month contract 
Pay Range: $70 - 79/hr
**No C2C resumes are considered**
 

Position Responsibilities:

  • Inference Optimization: Deploy and tune multiple LLMs and generative multimodal models on local inference hardware. Optimize performance metrics (TTFT, tokens/sec) via model quantization, caching strategies, and architecture-specific adjustments.
  • Systems Engineering & CUDA: Leverage deep knowledge of the CUDA environment to build custom kernels, ensuring maximum utilization of the low-cost GPU compute.
  • Orchestration & Integration: Seamlessly bridge inference backends with orchestration layers (LiteLLM, Ollama, etc.) and frontends like OpenWebUI.
  • Rapid Prototyping: Build functional, high-fidelity demos showcasing model memory capabilities, agentic workflows, and context-aware web search.
  • Peripheral Connectivity: Implement communication protocols to bridge local AI compute with peripheral devices, including smart TVs, household appliances, and XR hardware.
Requirements/Technical qualifications:
  • Recent experience in model optimization is required
  • Hardware & Compute: Proven experience with NVIDIA ecosystems and ARM64 architecture.
  • Systems Programming: Advanced proficiency in C++, Python, and Rust. Deep familiarity with CUDA and the ability to author/debug custom CUDA kernels for compute-intensive tasks.
  • AI/ML Frameworks: Extensive experience with modern inference engines (llama.cpp, TensorRT-LLM, Ollama) and orchestration frameworks (LiteLLM).
  • Software Engineering: Robust understanding of asynchronous programming (FastAPI), containerization (Docker/Kubernetes), sandbox environments, and API design for low-latency communication.
  • Full-Stack Prototyping: Ability to quickly spin up modern frontend UIs (React, Next.js, or similar) to present AI-driven intelligence to end users.
  • Communication Protocols: Familiarity with WebSockets, gRPC, and REST for device-to-device communication in a local network environment.
  • Overall Mandatory skills required: Model optimization recent exparience, Interference Optimization, NVIDIA ecosystems, Custom CUDA Kernel Development, ARM64 architecture, Python
Ideal Candidate Profile:
  • A minimum of 3 years of relevant industry experience is required
  • The "Builder" Mindset: You are energized by the prospect of building proofs-of-concept in days rather than months. You thrive in environments where speed and creativity are paramount.
  • Problem Solver: You approach unsolved, messy engineering challenges with enthusiasm rather than trepidation.
  • Architectural Vision: You see the "big picture" of how AI becomes part of consumers' daily lives, not just how the model generates text.
  • Agile & Adaptable: You are comfortable working in a fast-paced environment where priorities shift based on the results of rapid experimentation.
  • Degree in Computer Science, Machine Learning, or Artificial Intelligence Specialization preferred, but not required

**No C2C resumes are considered**
 

Thank you!

FocusKPI Hiring Team

Founded in 2010, FocusKPI, Inc. (FocusKPI) is a data science and technology firm specializing in predictive analytics practice and methodologies. FocusKPI is a US company headquartered in Silicon Valley, California, with an East Coast office in Boston, Massachusetts.

HQ

focuskpi Santa Clara, California, USA Office

1800 Wyatt Dr, Santa Clara, CA, United States, 95054

Similar Jobs

An Hour Ago
Easy Apply
Hybrid
Easy Apply
105K-131K Annually
Senior level
105K-131K Annually
Senior level
Cloud • Mobile • Software
Own accounting integration discovery, design, configuration, testing, and validation between BuildOps and customer ERPs. Lead finance discovery, define mappings and integration logic, execute test plans, reconcile data, troubleshoot discrepancies, and produce reusable documentation and playbooks to ensure accurate end-to-end financial synces and minimal post‑go‑live issues.
Top Skills: APIsBoomiCeligoCsvExcelGoogle SheetsIpaasMulesoftNetSuiteQuickbooks OnlineSage IntacctSpectrumViewpoint VistaWorkato
An Hour Ago
Hybrid
San Jose, CA, USA
144K-216K Annually
Senior level
144K-216K Annually
Senior level
Artificial Intelligence • Fintech • Software
Lead architecture and development of production AI products (chatbots, document processing, agentic workflows) and centralized AI platform components (model routing, provider management, vector search, MCP). Build scalable, cloud-native integrations with accounting systems and external APIs, drive AI system design and context engineering, ensure observability and compliance, and mentor engineers while shaping company-wide AI standards and governance.
Top Skills: Ai ObservabilityAWSConversational AiDocument ProcessingEmbeddingsLlm ApisModel Context Protocol (Mcp)PythonRest ApisRetrieval-Augmented Generation (Rag)RlhfSemantic SearchVector Search
An Hour Ago
Easy Apply
Remote or Hybrid
United States
Easy Apply
100K-125K Annually
Senior level
100K-125K Annually
Senior level
Cloud • Mobile • Software
Lead discovery, design, configuration, testing, and validation of accounting integrations between BuildOps and customers' ERPs. Map GL/accounts/entities, build and execute test plans for AP/AR/POs/payments, reconcile data, troubleshoot discrepancies, document solutions, and advise customers on best practices to ensure scalable, accurate end-to-end syncs.
Top Skills: APIsBoomiBuildopsCeligoCsvErpExcelGoogle SheetsIpaasMulesoftNetSuiteQuickbooks OnlineSage IntacctSpectrumViewpoint VistaWorkato

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account