XPeng Motors

GPGPU Software Architect/ Principal Engineer

Reposted 16 Days Ago

Be an Early Applicant

Easy Apply

In-Office

2 Locations

242K-409K Annually

Senior level

Easy Apply

In-Office

2 Locations

242K-409K Annually

Senior level

The role involves developing a software stack for GPGPU architecture, focusing on CUDA compatibility, performance modeling, and cross-functional collaboration in AI frameworks. Requires extensive experience in GPU software design.

The summary above was generated by AI

XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical take-off and landing (eVTOL) aircraft, and robotics. With a strong focus on intelligent mobility, XPENG is dedicated to reshaping the future of transportation through cutting-edge R&D in AI, machine learning, and smart connectivity.

Our pioneering first-generation NPU, utilizing DSA architecture, has successfully entered mass production. We're currently validating the architecture of our second generation and are making the strategic decision to transition towards General Purpose GPU (GPGPU) architecture.

We're completely overhauling our software stack and embracing the CUDA ecosystem. Our goal is to achieve over 90% compatibility with cuBLAS/cuDNN on Linux across PCIe and CXL connections, all while delivering at least 1.3 times the performance of existing solutions on Transformer and Stable-Diffusion workloads.

Job Responsibilities:

Software Technical Strategy

Develop and refine a comprehensive 3-year roadmap for a software stack compatible with CUDA, encompassing Runtime, Driver, Compiler, Profiler, Debugger, and AI acceleration libraries

Define binding specifications that link our upcoming GPU ISA to CUDA APIs, ensuring forward compatibility with CUDA 12.x features

Evaluate and integrate the latest technological advancements: CUDA Graph, Transformer Engine, virtual memory management, CUDA dynamic CUTLASS 3.x, TMA, Blackwell FP4, among others

Architecture & Design

Create a modular, layered Runtime architecture: CUDA → HAL → Kernel → Hardware, applicable across emulators, and actual silicon

Define the task launch protocol, including Queue, Stream, Event, and Graph, as well as the memory model

Design a dual-mode (JIT & offline) compiler supporting LTO, PGO, Auto-Tuning, and efficient PTX→ISA microcode caching

Develop GPU virtualization schemes(MIG) that work across processes and containers

Performance & Observability

Implement an end-to-end performance model: Python API → CUDA Runtime → Driver → ISA → Micro-architecture → Board-level interconnect

Build an observability platform: Nsys-compatible traces, real-time Metric-QPS dashboards, and an AI Advisor for identifying bottlenecks automatically

Manage internal AI benchmarks as the single source of truth. Benchmark includes MLPerf Inference, Stable Diffusion XL, and 70B LLM

Cross-functional Collaboration

Co-design ISA which compatible with CUDA Compute Capability 12.x with our hardware architecture team

Collaborate with AI framework teams (PyTorch, TensorFlow, JAX, ONNX Runtime) to build fully reusable kernel libraries

Partner with Cloud and K8s teams to co-develop Device Plugins, GPU Operators, and RDMA Network Policies

Minimum Requirements:

10 years + in systems software, with at least 5 years in designing CUDA Compute stacks

Led end-to-end development of a GPU Runtime or AI acceleration library generation

Comprehensive mastery of PTX/SASS, CUDA Driver API, and cuBLAS/cuDNN internals; experience with LLVM NVPTX backend

Profound understanding of GPU micro-architecture, including SM architecture, Warp Scheduler, Shared-Memory conflicts, and Tensor Core pipelines

Proficiency with PCIe/CXL/RDMA topologies, NUMA settings, and GPU Direct RDMA/Storage

The base salary range for this full-time position is $241,800 - $409,200 in addition to bonus, equity and benefits. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training.

We are an Equal Opportunity Employer. It is our policy to provide equal employment opportunities to all qualified persons without regard to race, age, color, sex, sexual orientation, religion, national origin, disability, veteran status or marital status or any other prescribed category set forth in federal or state regulations.

Top Skills

Ai Acceleration Libraries

Cublas

Cuda

Cuda Graph

Cudnn

Cufft

Llvm

Ptx

Sass

Palo Alto, CA, United States, 94301

Similar Jobs

Fortune Brands Innovations

Senior Product Designer

6 Minutes Ago

Hybrid

San Francisco, CA, USA

110K-165K Annually

Senior level

110K-165K Annually

Senior level

Manufacturing

Design and own end-to-end B2B connected product experiences (mobile, web, physical devices), lead research and prototyping, build and govern design systems, collaborate with cross-functional teams, and mentor junior designers.

Top Skills: Figma,Protopie,Jira,Material Design,Apple Human Interface Guidelines,Swiftui,Jetpack Compose,Iot Platforms,Ai Tools,Firmware

Benchling

Product Manager

An Hour Ago

Hybrid

San Francisco, CA, USA

153K-207K Annually

Mid level

153K-207K Annually

Mid level

Cloud • Healthtech • Social Impact • Software • Biotech

Lead product strategy and execution for Benchling's Admin Platform, driving discovery, prioritization, and delivery of admin and platform features (permissions, compliance, identity) to enable secure, scalable enterprise-grade admin experiences and self-service for customers and internal product teams.

Benchling

Enterprise Account Executive

An Hour Ago

Remote or Hybrid

150K-400K Annually

Senior level

150K-400K Annually

Senior level

Cloud • Healthtech • Social Impact • Software • Biotech

Drive new business within 1-3 Top 50 global accounts by generating pipeline, managing complex 7+ figure sales cycles, forecasting accurately, negotiating with multi-persona stakeholders, and partnering cross-functionally to align Benchling solutions to customer R&D and IT needs.

Top Skills: Salesforce,Aws,Benchling R&D Cloud,Meddicc

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

XPeng Motors

GPGPU Software Architect/ Principal Engineer

Top Skills

XPeng Motors Palo Alto, California, USA Office

Similar Jobs

Senior Product Designer

Product Manager

Enterprise Account Executive

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech