NVIDIA Logo

NVIDIA

Principal High-Performance LLM Training Engineer

Reposted Yesterday
Be an Early Applicant
In-Office
Santa Clara, CA, USA
272K-431K Annually
Expert/Leader
In-Office
Santa Clara, CA, USA
272K-431K Annually
Expert/Leader
The Principal Engineer will optimize AI training and post-training workloads on NVIDIA platforms, mentor others, and influence architectural decisions.
The summary above was generated by AI

NVIDIA is seeking a Principal Engineer to drive the performance of large-scale AI training and post-training workloads across NVIDIA’s full hardware and software stack. This role sits at the intersection of distributed training, GPU architecture, systems software, deep learning frameworks, and performance engineering. You will analyze and optimize frontier-scale LLM workloads running on thousands of GPUs, drive improvements across frameworks such as PyTorch, JAX, NeMo, and NeMo RL, and use insights from real workloads to help shape future NVIDIA GPU, system, and software roadmaps.

We are looking for a deeply technical leader who can operate across abstraction layers: from application-level training behavior to framework/runtime internals, CUDA libraries, communication collectives, memory systems, networking, and GPU architecture. At this level, success means both directly improving performance directly as well as setting technical direction, raising the bar for the organization, and influencing multi-functional decisions across NVIDIA.

What you will be doing:

  • Lead end-to-end performance analysis and optimization of innovative LLM pre-training and post-training workloads on the latest NVIDIA hardware and software platforms.

  • Drive workloads closer to speed-of-light performance by identifying and removing bottlenecks across compute, memory, communication, scheduling, parallelism strategy, kernel efficiency, framework overhead, and system-level scaling.

  • Develop production-quality software, tools, models, benchmarks, and analysis infrastructure that improve training performance, efficiency, and developer velocity across NVIDIA’s AI software stack.

  • Build and refine performance models, workload characterizations, and simulation methodologies to guide future GPU, networking, system, and software architecture decisions.

  • Serve as a technical authority for AI training performance, partnering closely with teams across GPU architecture, systems, CUDA libraries, compilers, networking, frameworks, product management, and applied AI.

  • Translate workload insights into concrete hardware and software recommendations, and advocate for changes that improve performance and efficiency across the AI ecosystem.

  • Mentor and provide technical leadership to engineers across the organization, helping establish best practices for large-scale AI performance analysis and optimization.

What we need to see:

  • A MS, or PhD (or equivalent experience) in Computer Science, Electrical Engineering, Computer Engineering, or a related field, with 12+ years of relevant work or research experience.

  • Demonstrated principal-level technical impact in one or more of the following areas: large-scale AI training systems, GPU performance optimization, distributed systems, high-performance computing, ML frameworks, compilers/runtimes, or hardware/software co-design.

  • Deep hands-on experience analyzing and optimizing performance of large-scale deep learning workloads, especially transformer-based models, LLM pre-training, reinforcement learning, fine-tuning, or other post-training workloads.

  • Strong understanding of GPU and AI accelerator architecture from individual accelerators to datacenter-scale systems.

  • Experience with distributed training techniques such as data parallelism, tensor parallelism, pipeline parallelism, expert parallelism, sequence parallelism, activation checkpointing, mixed precision training, and communication/computation overlap.

  • A strong track record of using profiling, tracing, benchmarking, and performance modeling tools to diagnose complex bottlenecks and drive measurable improvements.

  • Excellent communication and technical leadership skills, with the ability to influence architecture and software decisions across multiple teams without relying on direct authority.

GPU computing is the most productive and pervasive platform for deep learning and AI. It begins with the most advanced GPUs and the systems and software we build on top of them. We integrate and optimize every deep learning framework. We work with the major systems companies and every major cloud service provider to make GPUs available in data centers and in the cloud. We craft computers and software to bring AI to edge devices, such as self-driving cars and autonomous robots. AI has the potential to spur a wave of social progress unmatched since the industrial revolution.

This opportunity offers you the ability to collaborate with some of the most forward-thinking and hard-working people in the world, shaping the future of AI in a creative and autonomous work environment that encourages innovation. If you're passionate about working across the full hardware & software stack—from GPU architecture to application code—to achieve optimal performance, we want to hear from you!

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 272,000 USD - 431,250 USD.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until May 2, 2026.

This posting is for an existing vacancy. 

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

HQ

NVIDIA Santa Clara, California, USA Office

2701 San Tomas Expressway, Santa Clara, CA, United States, Santa Clara

Similar Jobs

6 Minutes Ago
Easy Apply
In-Office
San Francisco, CA, USA
Easy Apply
Senior level
Senior level
Healthtech • Software • Telehealth
Nourish seeks a Senior or Staff Product Designer to lead core product experiences and shape the integration of AI in healthcare. Responsibilities include user research, evolving design systems, and mentoring teammates, with a focus on crafting high-quality user experiences across web and mobile platforms.
Top Skills: AIDesign SystemsDigital Product DesignMobile DesignPrototypingUser ResearchWeb Design
6 Minutes Ago
Easy Apply
In-Office
San Francisco, CA, USA
Easy Apply
Senior level
Senior level
Healthtech • Software • Telehealth
Nourish seeks an Engineering Manager to lead engineering teams, drive technical direction, and contribute to system design, focusing on scalable solutions.
Top Skills: Node.jsPostgresReactTypescript
6 Minutes Ago
Easy Apply
In-Office
Easy Apply
Junior
Junior
Edtech • Fintech • Sports
The role involves prospecting, relationship-building, and closing new business partnerships while transferring accounts to an Account Manager.

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account