NIO Logo

NIO

AI Infrastructure Engineer

Reposted 17 Days Ago
Be an Early Applicant
In-Office
San Jose, CA, USA
192K-250K Annually
Senior level
In-Office
San Jose, CA, USA
192K-250K Annually
Senior level
The Sr. AI/LLM Infrastructure SW Engineer will develop AI models, lead architectural evaluations, and define AIOS solutions in vehicles, necessitating teamwork and strategic planning.
The summary above was generated by AI

JOB DESCRIPTION

About NIO

NIO is a pioneer and a leading company in the premium smart electric vehicle market. Founded in November 2014, NIO’s mission is to shape a joyful lifestyle. NIO aims to build a community starting with smart electric vehicles to share joy and grow together with users.

NIO designs, develops, jointly manufactures and sells premium smart electric vehicles, driving innovations in next-generation technologies in autonomous driving, digital technologies, electric powertrains and batteries. NIO differentiates itself through its continuous technological breakthroughs and innovations, such as its industry-leading battery swapping technologies, Battery as a Service, or BaaS, as well as its proprietary autonomous driving technologies and Autonomous Driving as a Service, or ADaaS.

NIO’s product portfolio consists of the ES8, a six-seater smart electric flagship SUV, the ES7 (or the EL7), a mid-large five-seater smart electric SUV, the ES6, a five-seater all-round smart electric SUV, the EC7, a five-seater smart electric flagship coupe SUV, the EC6, a five-seater smart electric coupe SUV, the ET7, a smart electric flagship sedan, and the ET5, a mid-size smart electric sedan.

About the Position
We are looking for a senior AI Inference Infrastructure Software Engineer with strong hands-on experience building, optimizing, and deploying high-performance, scalable inference systems. This position is focused on designing, implementing, and delivering production-grade software that powers real-world applications of Large Language Models (LLMs) and Vision-Language Models (VLMs).
This is an exciting opportunity for an engineer who thrives at the intersection of AI systems, hardware acceleration, and large-scale robust deployment, and who wants to see their contributions ship in production, at scale.
In this role, you will directly shape the architecture, roadmap and performance of AI capabilities of our AIOS platform, driving innovations that make LLM/VLM systems fast, efficient, and scalable across cloud, edge, and hybrid edge-cloud environments. You will work closely with system, hardware, and product teams to deliver high-performance inference kernels for hardware accelerators, design scalable inference serving systems, and integrate optimizations such tensor parallelism and custom kernels into production pipelines. Your work will have immediate impact, powering intelligent automotive systems in the next generation of electric vehicles.
Roles and Responsibilities:
  • Design and implement high-performance, scalable inference systems for LLMs and VLMs across cloud, edge, and edge-cloud hybrid platforms.
  • Develop and optimize custom kernels and operators for specific hardware accelerators (GPU, NPU, DSP, etc.), improving throughput, latency, and memory efficiency.
  • Integrate advanced optimization techniques such as KV-cache management, tensor/model parallelism, quantization, and memory-efficient execution into production inference systems.
  • Partner with system and hardware teams to ensure tight hardware-software integration and optimal performance across diverse compute environments.
  • Translate architectural requirements into robust, maintainable, production-ready software that meets performance, safety, and reliability standards.
  • Define and drive the evolution roadmap for LLM/VLM inference in the AIOS stack, ensuring scalability and adaptability to new workloads.
  • Stay ahead of industry trends and competitor solutions, applying best practices from both AI and large-scale systems engineering.
Must Qualifications:
  • 5+ years of hands-on software development experience in building and optimizing AI inference systems at scale.
  • Direct experience in LLM/VLM model internals, including Transformer-based architectures, inference bottlenecks, and optimization techniques.
  • Strong expertise in performance engineering: kernel development, parallelism strategies, memory optimization, and distributed inference systems.
  • Proficiency with GPU/NPU programming (CUDA, or vendor-specific SDKs), compiler toolchains, and deep learning frameworks (PyTorch, or TensorFlow).
  • Strong programming skills in C/C++, with a track record of delivering high-performance, production-grade software.
  • Solid foundation in computer architecture, systems programming (CPU/GPU pipelines, memory hierarchy, scheduling), and embedded systems.
  • BS/MS in Computer Science, Computer Engineering, or related technical field.
  • Excellent communication and collaboration skills, with the ability to work across cross-functional teams.
Preferred Qualifications:
  • Master’s or PhD degree in Computer Science, Electrical/Computer Engineering, or related fields, plus 5 years industry experience
  • Experience building inference serving systems for large models, including batching, scheduling, caching, and load balancing.
  • Expertise in hardware-aware model optimization (e.g., kernel fusion, mixed precision, quantization, pruning).
  • Familiarity with edge and embedded AI, including real-time constraints and limited-resource optimization.
  • Contributions to widely used AI frameworks, libraries, or performance-critical software (open source or proprietary).

Compensation:

The US base salary range for this full-time position is $192,100.00 - $249,600.00.
  • Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training.

  • Please note that the compensation details listed in US role postings reflect the base salary only. It does not include discretionary bonus, equity, or benefits.

Benefits:

Along with competitive pay, as a full-time NIO employee, you are eligible for the following benefits on the first day you join NIO:

  • CIGNA EPO, HSA, and Kaiser HMO medical plans with $0 for Employee Only Coverage.  

  • Dental (including orthodontic coverage) and vision plan.  Both provide options with a $0 paycheck contribution covering you and your eligible dependents.

  • Company Paid HSA (Health Savings Account) Contribution when enrolled in the High Deductible CIGNA medical plan

  • Healthcare and Dependent Care Flexible Spending Accounts (FSA)

  • 401(k) with Brokerage Link option

  • Company paid Basic Life, AD&D, short-term and long-term disability insurance

  • Employee Assistance Program

  • Sick and Vacation time

  • 13 Paid Holidays a year

  • Paid Parental Leave for first 8 weeks at full pay (eligible after 90 days of employment with NIO)

  • Paid Disability Leave for first 6 weeks at full pay (eligible after 90 days of employment with NIO)

  • Voluntary benefits including: Voluntary Life and AD&D options for you, your spouse/domestic partner and dependent child(ren), pet insurance

  • Commuter benefits

  • Mobile Cell Phone Credit

  • Healthjoy mobile benefit app supporting you and your dependents with benefit questions on the go & support with benefit billing questions

  • Free lunch and snacks

  • Onsite gym

  • Employee discounts and perks program

Top Skills

C/C++
Gpu/Npu Architectures
Python
PyTorch
TensorFlow

NIO San Jose, California, USA Office

3200 North 1st Street, San Jose, CA, United States, 95134

Similar Jobs

3 Days Ago
Hybrid
2 Locations
197K-246K Annually
Mid level
197K-246K Annually
Mid level
Fintech • Machine Learning • Payments • Software • Financial Services
Lead AI Engineer responsible for developing AI-powered products and deploying scalable AI solutions using technologies like LLM and machine learning algorithms. Collaborate with cross-functional teams to optimize performance and support AI systems.
Top Skills: AWSAzureGoGCPHuggingfaceJavaNemo GuardrailsPythonPyTorchScalaVectordbs
19 Days Ago
Remote or Hybrid
Sunnyvale, CA, USA
189K-291K Annually
Senior level
189K-291K Annually
Senior level
Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
As a Staff ML Infra Engineer, you will develop and deploy offboard machine learning solutions for autonomous vehicles, ensuring model integration and performance across teams. You'll build ML infrastructure, implement CI/CD pipelines, support data curation, and mentor engineers.
Top Skills: Ci/CdDockerKubernetesNumpyPythonPyTorch
2 Hours Ago
In-Office
San Francisco, CA, USA
75-75 Annually
Senior level
75-75 Annually
Senior level
Artificial Intelligence • Information Technology
Responsible for building and scaling the GPU Cloud Marketplace, transforming GPUs from suppliers into a programmable, orchestrated pool for AI developers and researchers.
Top Skills: BmcCi/CdCloud-InitCudaGpuInfinibandIpmiPulumiPxe BootRedfishTerraform

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account