NVIDIA Logo

NVIDIA

Senior Software Engineer, CUDA Deep Learning Systems

Posted 23 Days Ago
Be an Early Applicant
In-Office or Remote
2 Locations
184K-357K Annually
Senior level
In-Office or Remote
2 Locations
184K-357K Annually
Senior level
Develop and optimize high-performance CUDA kernels and distributed AI systems for deep learning applications, collaborating with researchers to enhance model performance and efficiency.
The summary above was generated by AI

We are looking for an experienced and highly motivated software professional to work on pioneering initiatives and projects at the intersection of CUDA and Deep Learning Systems. As the complexity and scale of artificial intelligence continue to grow, the intersection of advanced deep learning architectures, massive-scale distributed computing, and low-level hardware optimization has never been more critical. Our team is dedicated to exploring and prototyping next-generation ideas that bridge the gap between deep learning algorithms and CUDA, pushing the boundaries of what is possible on modern accelerator architectures.

Join our dynamic, research-oriented team to help unlock maximum hardware performance for emerging AI workloads. You will be a crucial member of a highly technical group exploring uncharted territories in model optimization, custom kernel development, and cluster-scale AI systems design. If you are passionate about the fundamentals of deep learning and thrive on squeezing every ounce of performance out of advanced computing systems from a single GPU to supercomputer clusters, we want you on our team!

What you will be doing:

  • Explore, research, and prototype novel systems optimizations for advanced deep learning models at the intersection of high-level DL frameworks and low-level CUDA through modeling, simulation, and silicon prototyping.

  • Architect and optimize distributed computing systems that scale seamlessly from a single node to massive, cluster-scale supercomputing environments.

  • Design, implement, and optimize custom high-performance CUDA kernels tailored to emerging neural network architectures and workloads.

  • Analyze complex hardware-software interactions to identify and resolve performance bottlenecks in both training and inference pipelines.

  • Collaborate closely with AI researchers, HW and SW architects, kernel and compiler authors and CUDA driver experts to co-design systems and algorithms that improve accelerator compute utilization, memory bandwidth, cross-node network communication efficiency and programmability.

  • Develop exploratory tools and runtime systems to profile and accelerate new paradigms in deep learning.

  • Write clean, effective, and maintainable code, ensuring exploratory prototypes can smoothly transition into open-source releases, upstream framework integrations, internal tools, or closed-source commercial products.

What we need to see:

  • BS, MS, or PhD degree in Computer Science, Computer Engineering, Electrical Engineering, or related field (or equivalent experience).

  • 8+ years of relevant industry experience or equivalent academic experience after degree achievement.

  • Strong proficiency in C++ and Python programming.

  • Solid background in the fundamentals of Deep Learning with a focus on transformers.

  • Strong understanding of distributed computing principles, multi-node scaling, and the unique performance challenges of cluster-scale execution.

  • Proven experience in systems programming, computer architecture, and low-level systems performance optimization.

  • Familiarity with deep learning accelerator architectures such as the GPU and hands-on experience with CUDA programming and kernel optimization.

  • A strong analytical approach with experience using profiling tools to deeply understand software performance on hardware.

  • Experience profiling and optimizing innovative vision models, generative AI architectures, or diffusion models.

  • Background in deep learning compilers, both graph-level and codegen (e.g., Triton, XLA, torch compile)

Ways to stand out from the crowd:

  • Deep expertise in the performance internals and execution graphs of major deep learning autograd, training and inference frameworks (e.g., PyTorch, JAX, TensorRT, vLLM, sgLang, Nemo, Megatron, MaxText, etc.).

  • Hands-on experience with CUDA, communication libraries (e.g., NCCL, MPI, UCX) and distributed machine learning techniques (e.g., pipeline parallelism, tensor parallelism).

  • Knowledge of numerical methods, low-precision arithmetic (e.g., NVFP4, MXFP4, FP8, INT8), and their implications on deep learning model accuracy and performance.

  • Familiarity with systems requirements for Reinforcement Learning (RL) or highly parallel simulation environments and/or research background in machine learning systems or adjacent fields.

  • Experience with machine learning, especially agentic systems, applied to systems problems.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until May 18, 2026.

This posting is for an existing vacancy. 

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

HQ

NVIDIA Santa Clara, California, USA Office

2701 San Tomas Expressway, Santa Clara, CA, United States, Santa Clara

Similar Jobs

2 Hours Ago
Remote or Hybrid
United States
67K-101K Annually
Junior
67K-101K Annually
Junior
Artificial Intelligence • Automotive • Greentech • Information Technology • Machine Learning • Software • Cybersecurity
Provide tactical HR support for Manheim Shared Services including employee relations, program implementation, talent and workforce initiatives, data analysis and reporting, and continuous improvement. Advise managers on policies, coordinate HR program logistics, conduct exit interviews, and partner with HRBPs and COEs to improve employee experience and organizational effectiveness. Up to 25% travel; US remote.
Top Skills: Excel
2 Hours Ago
Remote or Hybrid
United States
92K-154K Annually
Senior level
92K-154K Annually
Senior level
Artificial Intelligence • Automotive • Greentech • Information Technology • Machine Learning • Software • Cybersecurity
The Customer Success Manager is responsible for driving customer outcomes across a portfolio, managing relationships and retention, and collaborating across various teams to ensure value realization and maximize customer satisfaction.
Top Skills: AICloudCustomer SuccessManaged ServicesSaaS
2 Hours Ago
Remote or Hybrid
TX, USA
67K-101K Annually
Junior
67K-101K Annually
Junior
Artificial Intelligence • Automotive • Greentech • Information Technology • Machine Learning • Software • Cybersecurity
Provide compliance support for employment laws and internal policies: administer posters and notices, manage communications and audits, conduct data analyses, update policies, support cyclical programs, and serve as project lead for small HR compliance initiatives while partnering with stakeholders.
Top Skills: HrisMicrosoft Office (ExcelPowerpoint)Reporting ToolsWordWorkday

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account