Delos Data Logo

Delos Data

System Software Engineer - AI

Reposted 3 Days Ago
Hybrid
Palo Alto, CA, USA
140K-200K Annually
Mid level
Hybrid
Palo Alto, CA, USA
140K-200K Annually
Mid level
As a System Software Engineer, you will design communication primitives for AI models, optimize performance for training workloads, and benchmark performance on clusters.
The summary above was generated by AI

System Software Engineer - AI

About us:

We are a stealth-mode startup building foundational technology to address performance, scalability, and resiliency challenges in large-scale AI data center clusters. We are backed by top-tier VC firms and notable angel investors.

The company is led by experienced builders and operators who have founded companies, taken them to scale, and exited successfully. We work with a strong sense of unity and shared responsibility, and we expect trust, integrity, and respect in how we collaborate and make decisions. We hold ourselves accountable to one another and to the quality of the work we deliver.

Headquartered in Silicon Valley, we operate across a mix of remote and on-site locations in the U.S. and Canada. We aim to create an environment where people are treated fairly, supported in their growth, and are empowered to do meaningful work alongside others who take the craft seriously.

Some Recent Press:

https://www.eetimes.com/startup-boosts-scale-up-to-1000-gpus-in-a-single-domain/

We are looking for:

We are looking for a talented System Software Engineer to help us redefine the infrastructure layer of AI. In this role, you will bridge the gap between high-level AI frameworks and low-level system software. You will be responsible for designing and implementing the communication and execution primitives that allow large-scale AI models to run efficiently across thousands of GPUs. We are looking for a "builder" who thrives in the early stages of a product’s lifecycle and is passionate about solving the "hard" systems problems of the generative AI era.

Key Responsibilities:

  • Collaborate across the stack to influence the design of our foundational technology, ensuring it meets the needs of next-generation AI models.

  • Identify and resolve performance bottlenecks in distributed training and inference workloads through deep-dive analysis of the software-hardware interface.

  • Conduct rigorous performance benchmarking and characterization on multi-node clusters.

Required Skills and Qualifications:

  • Strong proficiency in C++ and Python, with a deep understanding of systems programming fundamentals (memory management, concurrency, OS internals).

  • Proficient in a Linux development environment.

Desired Skills:

  • Experience with GPU programming (CUDA) and performance optimization for parallel architectures.

  • Familiarity with distributed AI frameworks (PyTorch, JAX, or DeepSpeed) and/or inference engines (vLLM, SGLang, Dynamo/TRT-LLM).

  • Hands-on experience with large-scale cluster orchestration and telemetry tools.

Education:

  • Bachelor's or Master's degree in Computer Engineering, Computer Science, or a related field.

Location:

This is a hybrid role based in Palo Alto or Charlottesville, VA.

Compensation:

Target base salary for this role is $140,000 - $200,000 per year + meaningful equity + benefits + 401k. Our salary ranges are determined by role, level, experience, and location.

We are an equal opportunity employer. We value a range of perspectives and experiences and make employment decisions based on merit and business needs. We do not discriminate on the basis of legally protected characteristics.

Agency Note:

We do not accept resumes from agencies or search firms. Please do not forward candidate profiles through our careers page, email, LinkedIn messages, or directly to company employees. Any resumes submitted will be deemed the property of the company, and no fees will be paid in the event the candidate is hired.

#LI-EW1

Similar Jobs

23 Days Ago
In-Office
Santa Clara, CA, USA
184K-357K Annually
Senior level
184K-357K Annually
Senior level
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Develop tools for AI workload profiling and debugging in GPU clusters, collaborating with various teams to enhance performance and efficiency.
Top Skills: C++CudaKubernetesNcclPythonPyTorchSlurmTensorFlow
17 Days Ago
In-Office
San Francisco, CA, USA
204K-348K Annually
Mid level
204K-348K Annually
Mid level
Artificial Intelligence • Software
Design and build high-performance UI and APIs, manage database architecture, optimize systems for reliability and performance, and collaborate cross-functionally in a scientific context.
Top Skills: AWSCloudFormationGithub ActionsKubernetesNoSQLPythonReactSQLTerraformTypescript
14 Days Ago
Hybrid
Santa Clara, CA, USA
180K-280K Annually
Senior level
180K-280K Annually
Senior level
Artificial Intelligence • Machine Learning • Software
As a Senior Staff Software Engineer, you will develop and maintain AI deployment software, optimizing hardware-software co-design, and work closely with various experts to enhance infrastructure.
Top Skills: C/C++KubernetesLinuxNcclOnnx RuntimeOpenmpiPythonPyTorchRaySglangTensorFlowTensorrtVllm

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account