Genmo Logo

Genmo

GPU Performance Engineer

Reposted 9 Days Ago
Be an Early Applicant
In-Office
San Francisco, CA
Senior level
In-Office
San Francisco, CA
Senior level
Optimize GPU performance, debug issues, write custom kernels, and collaborate with ML engineers to enhance model serving efficiency.
The summary above was generated by AI

We are Genmo, a research lab dedicated to building open, state-of-the-art models for video generation towards unlocking the right brain of AGI. Join us in shaping the future of AI and pushing the boundaries of what's possible in video generation.

We're seeking a GPU Performance Engineer to squeeze every last FLOP from our H100 infrastructure and optimize our model serving stack to its absolute limits.
The Role
You'll be our performance optimization expert, using advanced profiling tools to identify bottlenecks and implementing solutions that achieve 5-10x speedups. From writing custom CUDA kernels to eliminating cold start latency, you'll ensure our infrastructure delivers world-class performance. This role is perfect for someone who gets excited about microsecond optimizations and pushing hardware to its theoretical limits.
Key Responsibilities

  • Profile and optimize GPU workloads using Nsight Systems, nvprof, and custom instrumentation

  • Write high-performance CUDA and Triton kernels for critical model operations

  • Optimize cold start latency from seconds to milliseconds for our serving infrastructure

  • Tune memory access patterns, kernel fusion, and GPU utilization

  • Collaborate with ML engineers to optimize model implementations

  • Debug performance issues across the full stack from application to hardware

  • Implement custom memory pooling and allocation strategies

  • Share optimization techniques and build performance culture across teams

Qualifications

  • Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related field

  • 5+ years systems programming experience with 3+ years focused on GPU optimization

  • Expert proficiency with GPU profiling tools (Nsight Systems, nvprof)

  • Strong CUDA programming skills with production kernel development

  • Deep understanding of GPU architecture (memory hierarchy, SMs, warps)

  • Track record of achieving significant performance improvements (5-10x)

  • Experience with Python and C++ in production environments

We Value

  • Experience with Triton kernel development

  • Knowledge of CUTLASS or similar high-performance libraries

  • Background in ML-specific optimizations (attention, transformers)

  • RDMA/InfiniBand optimization experience

  • Contributions to GPU libraries or frameworks

  • Low-level debugging skills (PTX/SASS reading)

Genmo is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law. Genmo, Inc. is an E-Verify company and you may review the Notice of E-Verify Participation and the Right to Work posters in English and Spanish.

Top Skills

C++
Cuda
Nsight Systems
Nvprof
Python
Triton

Genmo San Francisco, California, USA Office

2261 Market Street, San Francisco, CA, United States

Similar Jobs

11 Days Ago
Hybrid
4 Locations
217K-307K Annually
Senior level
217K-307K Annually
Senior level
Artificial Intelligence • Machine Learning • Robotics • Software • Transportation • Design • Manufacturing
As a Senior C++ Software Engineer, you will optimize GPU performance, develop benchmarks, analyze performance metrics, and support the software development teams.
Top Skills: C++CudaDatabricksLinuxLookerNsightSQLTensorrtXla
8 Days Ago
Hybrid
4 Locations
168K-239K Annually
Mid level
168K-239K Annually
Mid level
Artificial Intelligence • Machine Learning • Robotics • Software • Transportation • Design • Manufacturing
As a GPU Performance Software Engineer, you will optimize GPU algorithms, monitor performance, and support teams in enhancing compute utilization on next-gen architectures.
Top Skills: C++CudaDatabricksLinuxLookerNsightOpenglSQLTensorrtXla
7 Days Ago
Easy Apply
In-Office
3 Locations
Easy Apply
315K-560K Annually
Mid level
315K-560K Annually
Mid level
Artificial Intelligence • Natural Language Processing • Generative AI
As a GPU Performance Engineer, you will develop and implement systems for GPU optimization, enhancing performance for large language models, and addressing complexities in hardware and software integration.
Top Skills: CudaCutlassGpu ProgrammingJaxNcclNvlinkPyTorchTritonXla

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account