Intel Jobs

Inference Optimization Engineer (local / edge runtime)

Intel

Inference Optimization Engineer (local / edge runtime)

Posted 4 Days Ago

Be an Early Applicant

In-Office

Santa Clara, CA, USA

171K-315K Annually

Senior level

In-Office

Santa Clara, CA, USA

171K-315K Annually

Senior level

Optimize and profile local/edge inference engines (llama.cpp, vLLM) for latency, throughput, and memory on PC/edge hardware. Tune KV cache, batching, quantization strategy, reduce CPU overhead, improve model lifecycle, benchmark hardware tiers, and contribute upstream fixes to open-source engines.

The summary above was generated by AI

Job Details:

Job Description: Our Mission

At Intel, our journey is to transform AI into something safer, more trustworthy, and respectful of human privacy by design. We believe transformative AI should have a positive impact on people—powerful in capability, yet honest about its limits and protective of the data and resources it touches.

To get there, we build agentic AI that combines the best of local and cloud intelligence — private, affordable, and sustainable by design. Small, efficient models run directly on the user's machine (AI PC, edge, on-prem, and beyond), keeping data private and token costs low, while powerful cloud models handle the hardest work: planning, reasoning, and complex problem-solving. Today, neither approach can deliver this alone. Together, they give people real capability without compromise—data stays private, spend stays predictable, and energy use stays in check.

We're building intelligence that scales without sacrificing trust, cost, or the planet—because the future of AI should belong to the people it serves

Role Summary

Make models fast on the hardware people actually own. You optimize inference engines (llama.cpp, vLLM) for constrained local and edge environments — GPU/iGPUs, Vulkan backends — not datacenter H100 environment, mostly PC/edge. KV cache, batching, quantization, scheduling, and CPU-overhead reduction are your daily tools.

This is the rare skill that makes a hybrid, low-cost agent product viable.

What you’ll do

Profile and optimize local inference (llama.cpp-vulkan and vLLM) for latency, throughput, and memory on edge hardware
Tune KV cache, continuous batching, and scheduling for interactive agent workloads
Drive quantization strategy (GGUF / AWQ / GPTQ) and validate quality impact with the Post-Training team
Cut CPU overhead and improve engine startup, model load, and lifecycle (start / stop / health)
Benchmark across hardware tiers and publish honest performance comparisons
Upstream fixes and patches to open-source engines where it helps us

What you’ll learn / grow into

Curiosity is required. You will develop:

The internals of modern inference engines and where the milliseconds actually go
Hardware-aware optimization across iGPU / CPU paths (Vulkan, SYCL, oneAPI, CUDA where relevant)
The quality-vs-speed-vs-memory trade space for small models
Interest in local / edge AI and squeezing hardware

Qualifications:

Minimum qualifications are required to be initially considered for this position. Preferred qualifications are in addition to the minimum requirements and are considered a plus factor in identifying top candidates.
You must possess the minimum qualifications to be initially considered for this position. Preferred qualifications are in addition to the minimum requirements and are considered a plus factor in identifying top candidates.

Required Qualifications

BS/MS in CS, EE, Math or related STEM field
5+ years software development background
Strong in C++ and/or Python; comfortable reading systems-level code
Understands how LLM inference works (attention, KV cache, decoding)
Has profiled and optimized real performance problems (CPU or GPU) and can prove the speedup
Linux, build systems, and low-level debugging expertise

Preferred Qualifications

Hands-on with llama.cpp, vLLM, ggml, or similar engines
Experience with GPU / accelerator programming (Vulkan, CUDA, SYCL, Metal) or SIMD / CPU kernels
Familiarity with quantization formats and their quality trade-offs
Open-source contributions to inference engines

Requirements listed would be obtained through a combination of industry relevant job experience, internship experiences and or schoolwork/classes/research.

Benefits at Intel

Our total rewards package goes above and beyond just a paycheck. Whether you're looking to build your career, improve your health, or protect your wealth, we offer generous benefits to help you achieve your goals. Go to Intel Benefits | Intel Careers for details of benefits available to you. Intel reserves the right to modify, change or discontinue benefit plans at any time in its sole discretion.

Job Type:

Shift:Shift 1 (United States of America)

Primary Location: US, California, Santa Clara

Additional Locations:US, Arizona, Phoenix, US, California, Folsom, US, Oregon, Hillsboro

Business group:The Client Computing Group (CCG) is responsible for driving business strategy and product development for Intel's PC products and platforms, spanning form factors such as notebooks, desktops, 2 in 1s, all in ones. Working with our partners across the industry, we intend to deliver purposeful computing experiences that unlock people's potential - allowing each person use our products to focus, create and connect in ways that matter most to them.

Posting Statement:All qualified applicants will receive consideration for employment without regard to race, color, religion, religious creed, sex, national origin, ancestry, age, physical or mental disability, medical condition, genetic information, military and veteran status, marital status, pregnancy, gender, gender expression, gender identity, sexual orientation, or any other characteristic protected by local law, regulation, or ordinance.Position of TrustN/ABenefits

We offer a total compensation package that ranks among the best in the industry. It consists of competitive pay, stock bonuses, and benefit programs which include health, retirement, and vacation. Find out more about the benefits of working at Intel.

Annual Salary Range for jobs which could be performed in the US: $170,500.00-315,490.00 USD

The range displayed on this job posting reflects the minimum and maximum target compensation for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific compensation range for your preferred location during the hiring process.

Work Model for this Role

This role will be eligible for our hybrid work model which allows employees to split their time between working on-site at their assigned Intel site and off-site. * Job posting details (such as work model, location or time type) are subject to change.

ADDITIONAL INFORMATION: Intel is committed to Responsible Business Alliance (RBA) compliance and ethical hiring practices. We do not charge any fees during our hiring process. Candidates should never be required to pay recruitment fees, medical examination fees, or any other charges as a condition of employment. If you are asked to pay any fees during our hiring process, please report this immediately to your recruiter.

Robert Noyce Building, Santa Clara, CA, United States, 95052

2200 Mission College Blvd. , Santa Clara, CA, United States, 95054

Similar Jobs

Achieve

Senior Data Scientist

2 Hours Ago

Hybrid

San Mateo, CA, USA

165K-185K Annually

Senior level

165K-185K Annually

Senior level

Fintech • Professional Services • Sales • Financial Services

Lead development, maintenance, and monitoring of credit risk models and loss forecasts. Extract and analyze large datasets with Python/SQL, automate reporting and dashboards, perform EDA and stress/sensitivity analyses, document audit-ready model deliverables, support model governance/validation, and communicate insights to stakeholders to inform credit policy and decisioning.

Top Skills: CklightboxGoogle Cloud PlatformOscilarPythonPython WidgetsSQLTableauTaktileXgboost

Wells Fargo

Senior Premier Banker Tustin

4 Hours Ago

Hybrid

37K-66K Hourly

Senior level

37K-66K Hourly

Senior level

Fintech • Financial Services

Grow and manage relationships with affluent customers by providing advisory, multi-product banking solutions across deposits, lending, investments, and home/business banking. Proactively acquire new customers, lead discovery-based planning, coordinate with Wealth/Home Lending/Business partners, support branch service needs, champion digital adoption, and maintain accurate documentation and regulatory compliance. Role requires obtaining and maintaining FINRA and state insurance licenses.

Wells Fargo

Private Mortgage Banking Associate Manager

4 Hours Ago

Hybrid

San Mateo, CA, USA

Entry level

Fintech • Financial Services

Please provide the full job description text (replace ${desc}) so I can extract requirements, salary, technologies, and other details accurately.

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Intel

Inference Optimization Engineer (local / edge runtime)

Intel Santa Clara, California, USA Office

Intel Santa Clara, California, USA Office

Similar Jobs

Senior Data Scientist

Senior Premier Banker Tustin

Private Mortgage Banking Associate Manager

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech