Parasail Logo

Parasail

Senior Software Engineer, LLM Performance

Reposted 20 Days Ago
Easy Apply
In-Office or Remote
6 Locations
Senior level
Easy Apply
In-Office or Remote
6 Locations
Senior level
Optimize and integrate LLMs across the stack from GPU kernels to Kubernetes deployments. Improve inference performance via kernel development, algorithmic techniques (quantization, speculative decoding), and contributions to open-source LLM engines like vLLM. Drive hardware utilization, profiling, and enterprise-grade scalable implementations.
The summary above was generated by AI

Parasail is redefining AI infrastructure by enabling seamless deployment across a distributed network of GPUs, optimizing for cost, performance, and flexibility. Our mission is to empower AI developers with a fast, cost-efficient, and scalable cloud experience—free from vendor lock-in and designed for the next generation of AI workloads.

Job Description:

The Senior Software Engineer, LLM Performance plays a crucial role in delivering a competitive platform by focusing on efficiently scheduling, executing, and managing AI workloads on distributed compute systems. This role is deeply technical, spanning from low-level GPU kernels to distributed AI orchestration and Kubernetes (K8s) deployments. It is about more than optimization; it’s about pioneering efficient infrastructure that supports AI’s transformative role in reshaping productivity, revolutionizing industries, and addressing some of the world’s most challenging problems. You’ll ensure that generative AI — including large language models (LLMs), multi-modal models, and diffusion models — operates efficiently at enterprise scale while driving continuous improvements in cost, performance, and sustainability.

Responsibilities:

  • Add support for new LLMs, working across the stack from low-level GPU kernels to Kubernetes-based deployments.

  • Contribute to cutting-edge open-source LLM engines such as vLLM or SGLang to extend their capabilities and performance (e.g. use Python technologies to improve API servers or request schedulers).

  • Operate closer to the hardware, focusing on building and integrating solutions to boost performance and hardware utilization. For example, improve attention backends like FlashAttention or FlashInfer by contributing to their development and optimization, or by integrating their solutions into vLLM.

  • Improve LLM performance using advanced algorithmic solutions such as speculative decoding, quantization, or other state-of-the-art techniques. Understand the impact of such techniques in model quality.

Qualifications:

  • Expertise in GPU computing, including low-level platforms such as CUDA, ROCm, XLA, PyTorch, Jax, etc.

  • Background in performance analysis and optimization of AI/HPC workloads (e.g. profiling or theoretical analysis of Flops and bandwidth).

  • Experience in writing GPU kernels using technologies like CUDA, CUTLASS, Triton.

  • Strength in Python and C++.

  • Demonstrated contributions to open-source projects. Contributions to inference engines such as vLLM is a strong plus.

  • A production-oriented mindset emphasizing robust, scalable code suitable for enterprise-grade applications.

  • A relentless curiosity about cutting-edge AI technologies combined with a passion for solving complex problems.

What You Bring to the Table: We are looking for people who are eager to learn and master the lower-level compute concepts that are critical for the AI revolution. With us, your skills will not only contribute to coding but will also have a significant impact on the scalability and efficiency of AI applications at large. If you're geared up for the challenge of optimizing AI performance and eager to push our technological prowess to new heights, we're excited to welcome you aboard.

Top Skills

C++
Cuda
Cutlass
Flashattention
Flashinfer
Jax
Kubernetes
Python
PyTorch
Rocm
Sglang
Triton
Vllm
Xla
HQ

Parasail San Mateo, California, USA Office

4 W 4th Ave, San Mateo, California, United States, 94402 1619

Similar Jobs

13 Hours Ago
Easy Apply
Remote or Hybrid
Easy Apply
Senior level
Senior level
Artificial Intelligence • Marketing Tech • Software
The Engineering Manager will lead the Behavioral team to develop systems that turn customer signals into marketing triggers, ensuring scalable and reliable performance. Responsibilities include team growth, initiative delivery, and collaboration with multiple teams.
Top Skills: Asynchronous ProcessingEvent StreamingGoogle Cloud PlatformPub/Sub Systems
13 Hours Ago
Remote or Hybrid
San Francisco, CA, USA
160K-195K Annually
Senior level
160K-195K Annually
Senior level
Artificial Intelligence • HR Tech • Information Technology • Machine Learning • Software • App development • Industrial
The role involves managing BizOps and marketplace integrations, enhancing operational workflows, driving system integrations, and ensuring data integrity for enterprise growth.
Top Skills: APIsData PipelinesSQL
13 Hours Ago
Remote
United States
140K-240K Annually
Junior
140K-240K Annually
Junior
Artificial Intelligence • Productivity • Software • Automation
Sell Zapier's AI orchestration and automation platform to mid-market customers. Own full sales cycle from prospecting through close and expansion, build pipeline via inbound/outbound, manage forecasts in HubSpot/Gong, collaborate with cross-functional teams, and provide product feedback.
Top Skills: Ai OrchestrationChatgptClaudeGeminiGongHubspotLlmsZapier

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account