Broadcom Jobs

AI Systems Performance Engineer

Broadcom

AI Systems Performance Engineer

Reposted 2 Days Ago

Be an Early Applicant

In-Office

San Jose, CA, USA

141K-226K Annually

Senior level

In-Office

San Jose, CA, USA

141K-226K Annually

Senior level

As a Senior AI Systems Performance Engineer, you will benchmark AI workloads, optimize Ethernet fabric performance, and troubleshoot system bottlenecks, focusing on delivering actionable insights across teams.

The summary above was generated by AI

Please Note:

1. If you are a first time user, please create your candidate login account before you apply for a job. (Click Sign In > Create Account)

2. If you already have a Candidate Account, please Sign-In before you apply.

Job Description:

We are seeking a highly talented and experienced Senior AI Fabric Performance Engineer to take on a critical role within our Performance Lab. In this high-impact position, you will drive the performance benchmarking of AI inference, training and storage workloads with focus on our network infrastructure. You will be responsible to generate reports that aid the customers in deployment and marketing team to position the product.

While the AI workloads (inference and training) run on our servers, your primary focus will be optimizing the Ethernet fabric that connects them. You will be responsible for executing rigorous performance benchmarks, isolating complex system bottlenecks, and tuning parameters to achieve maximum throughput and minimum latency. If you possess a deep understanding of Ethernet fabric, machine learning system demands, and Linux environments, and you thrive on solving complex performance puzzles, we want you on our team.

Key Responsibilities

Benchmarking & Execution: Install, configure, and run industry-standard AI performance benchmarks, with a strong emphasis on MLPerf (Training and Inference) and NCCL tests.
Fabric Optimization: Tune and optimize network parameters, focusing heavily on Ethernet fabric performance, to ensure seamless data flow for distributed AI workloads running on server clusters.
Deep Debugging: Identify, isolate, and troubleshoot complex system performance bottlenecks spanning across the Linux OS, server hardware, and Ethernet switches.
Automation Development: Design, develop, and implement robust performance testing frameworks and automation tools to streamline continuous benchmarking.
Cross-Functional Collaboration: Document test methodologies, communicate performance findings, and provide actionable improvement recommendations to hardware, software, and networking stakeholders.

Required Qualifications

Education: Bachelor's / Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related technical field plus 12+ years / 10+ years related industry experience.
OS Expertise: Deep familiarity and hands-on experience with Linux operating systems, including system-level performance tuning and troubleshooting.
Programming Skills: Strong proficiency in programming and scripting languages, specifically Python and C++.
AI/ML Knowledge: Familiarity with modern machine learning frameworks, particularly PyTorch, and a solid understanding of how AI models consume compute and network resources.
Networking & Fabric: Proven experience in performance testing and validating Ethernet switch systems.
Analytical Capabilities: Extensive experience with performance metrics, profiling, and benchmarking tools. Strong problem-solving skills with a proven ability to diagnose root causes in complex, distributed systems.

Preferred Qualifications (Optional but recommended for a critical role)

Experience with RDMA (Remote Direct Memory Access) and RoCEv2 (RDMA over Converged Ethernet).
Prior experience building CI/CD pipelines for automated hardware or software performance regression testing.
Familiarity with containerization and orchestration tools (Docker, Kubernetes) used in AI deployments.

Additional Job Description:

Compensation and Benefits

The annual base salary range for this position is $141,300 - $226,000.

As a valued member of our team, you'll be eligible for a discretionary annual bonus and the opportunity to receive not only a competitive new hire equity grant, but also annual equity awards, connecting your success directly to the company's growth. All subject to relevant plan documents and award agreements.

Broadcom offers a competitive and comprehensive benefits package: Medical, dental and vision plans, 401(K) participation including company matching, Employee Stock Purchase Program (ESPP), Employee Assistance Program (EAP), company paid holidays, paid sick leave and vacation time. The company follows all applicable laws for Paid Family Leave and other leaves of absence.

Broadcom is proud to be an equal opportunity employer. We will consider qualified applicants without regard to race, color, creed, religion, sex, sexual orientation, national origin, citizenship, disability status, medical condition, pregnancy, protected veteran status or any other characteristic protected by federal, state, or local law. We will also consider qualified applicants with arrest and conviction records consistent with local law.

If you are located outside USA, please be sure to fill out a home address as this will be used for future correspondence.

1320 Ridder Park Drive, San Jose, CA, United States, 95131

Similar Jobs

NVIDIA

Systems Performance Engineer, Agentic AI Workloads – New College Grad 2026

10 Days Ago

In-Office

Santa Clara, CA, USA

124K-242K Annually

Entry level

124K-242K Annually

Entry level

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse

Develop and extend C++ and Python simulators for agentic AI workloads, run simulations, analyze performance bottlenecks, and collaborate on AI system designs.

Top Skills: C++Deep LearningLlm FrameworksPythonQueueing TheoryTraffic Modeling

NVIDIA

Software Engineer

11 Days Ago

In-Office or Remote

Santa Clara, CA, USA

224K-357K Annually

Senior level

224K-357K Annually

Senior level

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse

The Senior Systems Software Engineer will optimize AI stack readiness for the DGX Station, focusing on application performance, DL framework analysis, system-level optimization, and collaborating with cross-functional teams to enhance multi-GPU capabilities and ensure product effectiveness.

Top Skills: C/C++CudaCuptiDcgmDocaJaxNsight ComputeNsight SystemsOfedPythonPyTorchTensorFlowTensorrt

SambaNova Systems

Senior AI Systems Performance Engineer

15 Days Ago

In-Office

San Jose, CA, USA

Mid level

Artificial Intelligence • Hardware • Machine Learning • Natural Language Processing • Software • Generative AI

As a Senior AI Systems Performance Engineer, optimize and scale foundation models, enhance performance across various system layers, and deliver high-performance AI applications.

Top Skills: C++CudaDeepspeedJaxMegatronOpenclPythonPyTorchTensorFlowTensorrtTritonVllm

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Broadcom

AI Systems Performance Engineer

Broadcom San Jose, California, USA Office

Similar Jobs

Systems Performance Engineer, Agentic AI Workloads – New College Grad 2026

Software Engineer

Senior AI Systems Performance Engineer

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech