Persimmons, Inc.

Technical Lead, Runtime Software/Hardware (Spatial AI Accelerator)

Posted 4 Days Ago

Be an Early Applicant

In-Office

San Jose, CA

Senior level

In-Office

San Jose, CA

Senior level

Lead the design and implementation of runtime systems for an AI accelerator, overseeing architecture, performance tuning, and team mentorship.

The summary above was generated by AI

Who we are:

Persimmons is building the infrastructure that will power the next decade of AI. Founded in 2023 by veteran technologists from the worlds of semiconductors, AI systems, and software innovation, We’re on a mission to enable smarter devices, more sustainable data centers, and entirely new applications the world hasn’t imagined yet.

Why join us:

We’re growing fast and looking for bold thinkers, builders, and curious problem-solvers who want to push the limits of AI hardware and software. If you're ready to join a world-class team and play a critical role in making a global impact - we want to talk to you.

Summary of Role:

Persimmons.ai seeks a multidisciplinary Technical Lead for runtime software/hardware and compiler integration, focused on our next-generation custom spatial AI accelerator. You will architect and guide the runtime system bridging compiler, host, driver, device firmware, and control hardware: enabling high-performance, robust, and scalable execution of modern AI workloads.

This is a hands-on and technical leadership role spanning system design, cross-stack engineering, technical mentorship, and collaboration with compiler, ML framework, and hardware teams.

What you’ll do:

Architect, design, and implement the runtime stack for Persimmons' custom spatial accelerator, covering host drivers, device runtime, and hardware/firmware control loops.
Lead technical direction and decisions for runtime–hardware interface, device work and command queue infrastructure, and memory management.
Coordinate with compiler/backend, ML systems, and hardware architects to ensure seamless end-to-end ML model execution.
Define and co-design hardware support features essential to runtime: queueing structures, synchronization primitives, interrupt/event signaling, dispatching and orchestrating ML workloads on spatial execution fabric.
Drive performance analysis, development tools for tracing, bottleneck identification, and runtime-level optimizations for latency, throughput, and hardware utilization.
Build and mentor a cross-disciplinary engineering team focused on runtime and system validation—establishing best practices, technical standards, and robust software-hardware collaboration.
Champion efficient tooling, simulation/emulation environments, and test infrastructure for system validation and robust runtime dev/debug.

Requirements

*We do not expect candidates to meet all of the requirements listed below; strong candidates will demonstrate expertise in several key areas.*

Deep experience architecting runtime software, device firmware, hardware interfaces, or control systems for AI accelerators and/or high-performance SoCs.
Hands-on expertise developing drivers, resource managers, command/queue control, and dispatching and synchronization primitives (queues, barriers, event notifications) for custom hardware.
Strong understanding of C/C++ multi-threaded programming and concurrent system design, including experience developing and debugging software that leverages threads, synchronization primitives, and parallel runtime constructs to maximize hardware utilization and performance in latency- and throughput-sensitive environments.
Solid understanding of hardware–software co-design principles: memory hierarchies, DMA engines, interconnects, job scheduling, on-device synchronization.
Experience integrating kernel libraries into device runtime stacks—connecting optimized compute kernels (such as SIMD operations and common AI operator libraries) to runtime software through seamless invocation and well-defined APIs, efficient scheduling and memory/resource management.
Experience with modern large language model (LLM) inference servers and serving stacks (e.g., vLLM, TensorRT-LLM, Triton Inference Server, Hugging Face Text Generation Inference, Ray Serve), including their architecture, runtime scheduling, memory management, batching, streaming, and distributed deployment. Understanding of how runtime design, kernel integration, and hardware acceleration impact performance, scalability, and latency in LLM serving workloads.
Experience with system-level performance tuning, debugging complex hardware–software interactions, and building scalable test/validation infrastructure.
High level of understanding and 5+ years of experience with in C/C++; familiarity with hardware description languages (Verilog/VHDL/SystemVerilog), or firmware development is a strong plus.
Drive for innovation—keeping up with new architectures, techniques, and runtime models in ML or spatial computing.

Benefits

Competitive salary and benefits package.
Flexible PTO
401k

Please note: Our organization does not accept unsolicited candidate submissions from external recruiters or agencies. Any such submissions, regardless of form (including but not limited to email, direct messaging, or social media), shall be deemed voluntary and shall not create any express or implied obligation on the part of the organization to pay any fees, commissions, or other compensation. Direct contact of employees, officers, or board members regarding employment opportunities is strictly prohibited and will not receive a response.

Top Skills

C++

Systemverilog

Verilog

Vhdl

San Jose, California, United States, 95054

Similar Jobs

Gusto

Front-end Engineer

36 Minutes Ago

Easy Apply

Hybrid

San Francisco, CA, USA

Easy Apply

225K-275K Annually

Senior level

225K-275K Annually

Senior level

Fintech • HR Tech

Lead and innovate in defining Gusto's frontend experience, focusing on design systems, component development, and accessibility while leveraging AI for improvements.

Top Skills: AICSSHTMLJavaScriptReactSassTypescript

Collectors

Data Analyst

40 Minutes Ago

In-Office

Santa Ana, CA, USA

72K-118K Annually

Junior

72K-118K Annually

Junior

Consumer Web • eCommerce • Machine Learning • Software • Sports • Analytics

The Data Analyst will analyze data to optimize internal processes, create reports, and support strategic insights for the PSA Grading and Brand Protection teams.

Top Skills: LightdashLookerModeSQLTableau

Rattle

Growth Marketing Manager

2 Hours Ago

In-Office

San Francisco, CA, USA

130K-145K Annually

Mid level

130K-145K Annually

Mid level

Software

The Growth Marketing Manager will oversee paid media strategies, optimize website conversions, manage HubSpot workflows, and develop audience targeting strategies for effective inbound marketing.

Top Skills: Google AdsHubspotLinkedin Campaign ManagerWebflow

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Persimmons, Inc.

Technical Lead, Runtime Software/Hardware (Spatial AI Accelerator)

Top Skills

Persimmons, Inc. San Jose, California, USA Office

Similar Jobs

Front-end Engineer

Data Analyst

Growth Marketing Manager

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech