NVIDIA Logo

NVIDIA

AI Software Engineer, LLM Inference Performance Analysis - New College Grad 2026

Reposted 3 Days Ago
In-Office
4 Locations
124K-219K Annually
Entry level
In-Office
4 Locations
124K-219K Annually
Entry level
The role involves optimizing LLM inference performance on NVIDIA platforms through compiler analysis and tuning, collaborating across teams to enhance efficiency.
The summary above was generated by AI

NVIDIA is at the forefront of the generative AI revolution. We are looking for a Software Engineer, Performance Analysis, and Optimization for LLM Inference, to join our performance engineering team. In this role, you will focus on improving the efficiency and scalability of large language model (LLM) inference on NVIDIA Computing Platforms through compiler and kernel-level analysis and optimizations. You will work on key components that span IR-based compiler optimization, graph-level transformations, and precompiled kernel performance tuning to deliver innovative inference speed and efficiency.

As a core contributor, you will collaborate with groups passionate about compiler, kernel, hardware, and framework development. You will analyze performance bottlenecks, develop new optimization passes, and validate gains through profiling and projection tools. Your work will directly influence the runtime behavior and hardware utilization of next-generation LLMs deployed across NVIDIA’s data center and embedded platforms.

What you'll be doing:

  • Analyze the performance of LLMs running on NVIDIA Compute Platforms using profiling, benchmarking, and performance analysis tools.

  • Understand and find opportunities for compiler optimization pipelines, including IR-based compiler middle-end optimizations and kernel-level transformation s

  • Design and develop new compiler passes and optimizations techniques to deliver best-in-class, robust, and maintainable compiler infrastructure and tools.

  • Collaborate with hardware architecture, compiler, and kernel teams to understand how firmware and circuitry co-design enables efficient LLM inference.

  • Work with globally distributed teams across compiler, kernel, hardware, and framework domains to investigate performance issues and contribute to solutions.

What we need to see:

  • Master’s or PhD in Computer Science, Computer Engineering, or a related field, or equivalent experience.

  • Strong hands-on programming expertise in C++ and Python, with solid software engineering fundamentals.

  • Foundational understanding of modern deep learning models (including transformers and LLMs) and interest in inference performance and optimization.

  • Exposure to compiler concepts such as intermediate representations (IR), graph transformations, scheduling, or code generation through coursework, research, internships, or projects.

  • Familiarity with at least one deep learning framework or compiler/runtime ecosystem (e.g., TensorRT-LLM, PyTorch, JAX/XLA, Triton, vLLM, or similar).

  • Ability to analyze performance bottlenecks and reason about optimization opportunities across model execution, kernels, and runtime systems.

  • Experience working on class projects, internships, research, or open-source contributions involving performance-critical systems, compilers, or ML infrastructure.

  • Strong communication skills and the ability to collaborate effectively in a fast-paced, team-oriented environment.

Ways to stand out from the crowd:

  • Proficiency in CUDA programming and familiarity with GPU-accelerated deep learning frameworks and performance tuning techniques.

  • Showcase innovative applications of agentic AI tools that enhance productivity and workflow automation.

  • Active engagement with the open-source LLVM or MLIR community to ensure tighter integration and alignment with upstream efforts.

NVIDIA is recognized as one of the world’s most desirable engineering environments, built by teams who value technical depth, innovation, and impact. We work alongside some of the best minds in GPU computing, systems software, and AI. If you’re driven by performance, enjoy solving sophisticated problems, and thrive in an environment that rewards initiative and technical perfection, we’d love to hear from you!

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 124,000 USD - 195,500 USD for Level 2, and 152,000 USD - 218,500 USD for Level 3.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until January 18, 2026.

This posting is for an existing vacancy. 

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Top Skills

C++
Cuda
Jax/Xla
Python
PyTorch
Tensorrt-Llm
Triton
Vllm
HQ

NVIDIA Santa Clara, California, USA Office

2701 San Tomas Expressway, Santa Clara, CA, United States, Santa Clara

Similar Jobs

53 Minutes Ago
In-Office
Plano, TX, USA
Mid level
Mid level
Cloud • Information Technology • Internet of Things • Machine Learning • Software • Cybersecurity • Infrastructure as a Service (IaaS)
The Data Center Optical Engineer handles installation, maintenance, troubleshooting, and optical testing in data center operations while ensuring high service standards and customer satisfaction.
Top Skills: EthernetFiber CablesOptical Power MetersOptical Testing Equipment
3 Hours Ago
Remote or Hybrid
United States
18-20 Hourly
Junior
18-20 Hourly
Junior
Artificial Intelligence • Edtech • Healthtech • Software
As a Student Support Specialist, you'll assist students with inquiries, provide support to ensure they stay on track, and collaborate to resolve issues.
Top Skills: FrontHubspotVideo-Conferencing SoftwareZendesk
4 Hours Ago
In-Office
Plano, TX, USA
72K-119K Annually
Mid level
72K-119K Annually
Mid level
Aerospace • Cloud • Digital Media • Information Technology • Mobile • News + Entertainment • Generative AI
The Network Surveillance Engineer will monitor and troubleshoot network equipment in a 24/7 environment, manage incident tickets, and collaborate with various teams to restore services.
Top Skills: 5GBfdBgpCefCloud-NativeGnssIs-IsMplsPtpService Now

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account