Senior AI Infrastructure Engineer

Sorry, this job was removed at 06:09 p.m. (PST) on Tuesday, Aug 19, 2025

Be an Early Applicant

In-Office

Santa Clara, CA

In-Office

Santa Clara, CA

Similar Jobs

Cox Enterprises

Director, Vendor Performance Management (Cox Automotive Fleet Client Solutions and Delivery)

13 Hours Ago

Remote or Hybrid

CA, USA

132K-219K Annually

Expert/Leader

132K-219K Annually

Expert/Leader

Automotive • Cloud • Greentech • Information Technology • Other • Software • Cybersecurity

The Director leads the development and execution of vendor management strategies, ensuring operational efficiency and client satisfaction while managing a team and optimizing service delivery.

Top Skills: Data-Driven AnalyticsOperational MetricsService Platforms

Square

Account Executive

15 Hours Ago

Remote or Hybrid

130K-234K Annually

Senior level

130K-234K Annually

Senior level

eCommerce • Fintech • Hardware • Payments • Software • Financial Services

The Senior Sales Account Executive will manage outbound sales strategies, source leads, and close deals in various merchant verticals, using a consultative sales approach and Salesforce for tracking.

Top Skills: Salesforce

Square

Product Manager

15 Hours Ago

Remote or Hybrid

139K-245K Annually

Mid level

139K-245K Annually

Mid level

eCommerce • Fintech • Hardware • Payments • Software • Financial Services

As a Hardware Product Manager, you'll manage product health, monitor device performance, identify opportunities, and translate them into requirements for development teams while communicating product roadmaps and prioritizing execution.

Top Skills: Data Science ToolsTelemetry Tools

We are now seeking a Senior AI Infrastructure Engineer! NVIDIA’s Compute Architecture Group is growing our team of AI focused Infrastructure Engineers who run our internal cluster for accelerated AI and software development. As part of this team, you will help to manage a diverse cluster of GPU-accelerated systems. Your contributions will enable engineers to work efficiently with a wide variety of forward-looking hardware configurations as they vigilantly seek out opportunities for performance optimization and continuously deliver high quality software.

Our ideal candidate is versatile enough to apply expertise from many domains: system administration, performance analysis, automation, and architecture. Your work will enable the ground breaking experimentation that allows us to design the world’s most powerful systems for the most demanding computing applications. You will have a meaningful impact at a fast-moving company that is spearheading the next wave in computing technology. Join our technically diverse team of GPU architects, software engineers and infrastructure experts to unlock unprecedented performance in every domain!

What you'll be doing:

Administer an NVIDIA Internal AI cluster composed of Linux systems ranging from the world’s most powerful servers to embedded systems
Maintain the configuration of our resource management system (SLURM) to keep resource allocation efficient and aligned with organizational priorities
Automate configuration management, software updates, and maintenance of system availability using modern DevOps tools (Ansible, Gitlab, etc.)
Plan and maintain new systems that support the NVIDIA Software stack
Work directly with developers and hardware architects to debug issues, identify new requirements, and improve workflows
Actively communicate with users and management regarding resource planning and allocation

What we need to see:

5+ years of previous experience deploying and administering large scale clusters, tuned for development efforts in AI
MS in Computer Science, Computer Engineering, or EECE; or a BS (or equivalent experience).
Deep knowledge of distributed resource scheduling systems (Slurm (preferred), LSF, etc.)
Demonstrated ability to script in bash, and at least one high-level language (Python preferred)
Experience with container technologies (Docker, Singularity, etc.)
Deep understanding of operating systems, computer networks, and high-performance hardware
Ability to work well with developers, hardware architects, & test engineers
Passionate dedication to providing quality support for users

Ways to stand out from the crowd:

Prior work experience managing high performance fabrics and parallel file systems
Familiarity with CUDA and managing GPU-accelerated computing systems
Basic knowledge of deep learning frameworks and algorithms

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 144,000 USD - 230,000 USD for Level 3, and 168,000 USD - 270,250 USD for Level 4.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until July 29, 2025.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

2701 San Tomas Expressway, Santa Clara, CA, United States, Santa Clara

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

NVIDIA

Senior AI Infrastructure Engineer

Similar Jobs

Director, Vendor Performance Management (Cox Automotive Fleet Client Solutions and Delivery)

Account Executive

Product Manager

NVIDIA Santa Clara, California, USA Office

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech