Genmo

GPU Performance Engineer

Reposted 9 Days Ago

Be an Early Applicant

In-Office

San Francisco, CA

Senior level

In-Office

San Francisco, CA

Senior level

Optimize GPU performance, debug issues, write custom kernels, and collaborate with ML engineers to enhance model serving efficiency.

The summary above was generated by AI

We are Genmo, a research lab dedicated to building open, state-of-the-art models for video generation towards unlocking the right brain of AGI. Join us in shaping the future of AI and pushing the boundaries of what's possible in video generation.

We're seeking a GPU Performance Engineer to squeeze every last FLOP from our H100 infrastructure and optimize our model serving stack to its absolute limits.
The Role
You'll be our performance optimization expert, using advanced profiling tools to identify bottlenecks and implementing solutions that achieve 5-10x speedups. From writing custom CUDA kernels to eliminating cold start latency, you'll ensure our infrastructure delivers world-class performance. This role is perfect for someone who gets excited about microsecond optimizations and pushing hardware to its theoretical limits.
Key Responsibilities

Profile and optimize GPU workloads using Nsight Systems, nvprof, and custom instrumentation
Write high-performance CUDA and Triton kernels for critical model operations
Optimize cold start latency from seconds to milliseconds for our serving infrastructure
Tune memory access patterns, kernel fusion, and GPU utilization
Collaborate with ML engineers to optimize model implementations
Debug performance issues across the full stack from application to hardware
Implement custom memory pooling and allocation strategies
Share optimization techniques and build performance culture across teams

Qualifications

Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related field
5+ years systems programming experience with 3+ years focused on GPU optimization
Expert proficiency with GPU profiling tools (Nsight Systems, nvprof)
Strong CUDA programming skills with production kernel development
Deep understanding of GPU architecture (memory hierarchy, SMs, warps)
Track record of achieving significant performance improvements (5-10x)
Experience with Python and C++ in production environments

We Value

Experience with Triton kernel development
Knowledge of CUTLASS or similar high-performance libraries
Background in ML-specific optimizations (attention, transformers)
RDMA/InfiniBand optimization experience
Contributions to GPU libraries or frameworks
Low-level debugging skills (PTX/SASS reading)

Genmo is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law. Genmo, Inc. is an E-Verify company and you may review the Notice of E-Verify Participation and the Right to Work posters in English and Spanish.

Top Skills

C++

Cuda

Nsight Systems

Nvprof

Python

Triton

2261 Market Street, San Francisco, CA, United States

Similar Jobs

Zoox

Senior Software Engineer

11 Days Ago

Hybrid

217K-307K Annually

Senior level

217K-307K Annually

Senior level

Artificial Intelligence • Machine Learning • Robotics • Software • Transportation • Design • Manufacturing

As a Senior C++ Software Engineer, you will optimize GPU performance, develop benchmarks, analyze performance metrics, and support the software development teams.

Top Skills: C++CudaDatabricksLinuxLookerNsightSQLTensorrtXla

Zoox

Software Engineer

8 Days Ago

Hybrid

168K-239K Annually

Mid level

168K-239K Annually

Mid level

Artificial Intelligence • Machine Learning • Robotics • Software • Transportation • Design • Manufacturing

As a GPU Performance Software Engineer, you will optimize GPU algorithms, monitor performance, and support teams in enhancing compute utilization on next-gen architectures.

Top Skills: C++CudaDatabricksLinuxLookerNsightOpenglSQLTensorrtXla

Anthropic

Performance Engineer, GPU

7 Days Ago

Easy Apply

In-Office

Easy Apply

315K-560K Annually

Mid level

315K-560K Annually

Mid level

Artificial Intelligence • Natural Language Processing • Generative AI

As a GPU Performance Engineer, you will develop and implement systems for GPU optimization, enhancing performance for large language models, and addressing complexities in hardware and software integration.

Top Skills: CudaCutlassGpu ProgrammingJaxNcclNvlinkPyTorchTritonXla

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Genmo

GPU Performance Engineer

Top Skills

Genmo San Francisco, California, USA Office

Similar Jobs

Senior Software Engineer

Software Engineer

Performance Engineer, GPU

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech