Cerebras Systems Logo

Cerebras Systems

Senior Research Engineer - Inference ML

Sorry, this job was removed at 10:04 p.m. (PST) on Tuesday, Mar 10, 2026
Be an Early Applicant
In-Office
Sunnyvale, CA, USA
In-Office
Sunnyvale, CA, USA

Similar Jobs

An Hour Ago
Remote or Hybrid
Senior level
Senior level
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Manage and grow ServiceNow partner relationships across Canada: build partner practices, set targets, drive governance, enablement, reporting, business reviews, remediation plans, and achieve joint revenue goals while coaching partners and collaborating with global teams.
Top Skills: AIServicenow
An Hour Ago
Hybrid
75K-85K Annually
Senior level
75K-85K Annually
Senior level
Automotive • Hardware • Robotics • Software • Transportation • Manufacturing
Lead MES implementation and support across plants, translate business requirements into MES/ERP integrations, maintain data integrity and reporting, troubleshoot OT/IT connectivity, and ensure compliance with quality and cybersecurity standards to improve production performance.
Top Skills: ClarotyEpicor CmsFunnelcloudHmisIiotLabeling SystemsOeePower BISAPScannersSQLSwitchesTcp/IpTraceability ToolsVlansWi-Fi
An Hour Ago
Hybrid
36-40 Hourly
Junior
36-40 Hourly
Junior
Automotive • Hardware • Robotics • Software • Transportation • Manufacturing
Set up and operate RIM molding presses, perform first-off checks, troubleshoot process and quality issues, start/stop press operations, respond to downtime, escalate to support teams, report inaccuracies, attend process meetings, and participate in continuous improvement while following safety and quality procedures.
Top Skills: AbsAsaCraneDdcDecoupled MoldingExcelPc-AbsRjg MoldRjg SystematicTowmotorTpo

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.  

Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. 

Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

About The Role 

As a Senior Research Engineer on the Inference ML team at Cerebras Systems, you will adapt today's most advanced language and vision models to run efficiently on our flagship Cerebras architecture. You'll work alongside ML researchers and engineers to design, prototype, validate, and optimize models, gaining end-to-end exposure to cutting-edge inference research on the world's fastest AI accelerator. 

You will focus on pushing the frontier of speculative decodinglarge-model pruning and compressionsparse attention, and sparsity-driven techniques to deliver low-latency, high-throughput inference at scale. 

Responsibilities 

  • Design, implement, and optimize state-of-the-art transformer architectures for NLP and computer vision on Cerebras hardware. 
  • Research and prototype novel inference algorithms and model architectures that exploit the unique capabilities of Cerebras hardware, with emphasis on speculative decoding, pruning/compression, sparse attention, and sparsity. 
  • Train models to convergence, perform hyperparameter sweeps, and analyze results to inform next steps. 
  • Bring up new models on the Cerebras system, validate functional correctness, and troubleshoot any integration issues. 
  • Profile and optimize model code using Cerebras tools to maximize throughput and minimize latency. 
  • Develop diagnostic tooling or scripts to surface performance bottlenecks and guide optimization strategies for inference workloads. 
  • Collaborate across teams, including software, hardware, and product, to drive projects from inception through delivery. 

Minimum Qualifications 

  • One of the following education and experience combinations: 
    • Bachelor’s degree in Computer Science, Software Engineering, Computer Engineering, Electrical Engineering, or a related technical field AND 7+ years of ML software development experience, OR 
    • Master’s degree in Computer Science or related technical field AND 4+ years of software development experience, OR 
    • PhD in Computer Science or related technical field with 2+ years of relevant research or industry experience, OR 
    • Equivalent practical experience. 
  • 4+ years of experience testing, maintaining, or launching software products, including 2+ years of experience with software design and architecture. 
  • 3+ years of experience in software development focused on machine learning (e.g., deep learning, large language models, or computer vision). 
  • Strong programming skills in Python and/or C++.
  • Experience with Generative AI and Machine Learning systems. 
  • Evidence of research impact in machine learning, such as publications at top conferences (NeurIPS, ICLR, ICML, ACL, EMNLP, MLSys) or comparable contributions to widely used open-source projects or high-quality preprints.

Preferred Qualifications 

  • Master’s degree or PhD in Computer Science, Computer Engineering, or a related technical field. 
  • Experience independently driving complex ML or inference projects from prototype to production-quality implementations. 
  • Hands-on experience with relevant ML frameworks such as PyTorch, Transformers, vLLM, or SGLang. 
  • Experience with large language models, mixture-of-experts models, multimodal learning, or AI agents. 
  • Experience with speculative decodingneural network pruning and compressionsparse attentionquantizationsparsity, post-training techniques, and inference-focused evaluations. 
  • Familiarity with large-scale model training and deployment, including performance and cost trade-offs in production systems. 
  • Triton/CUDA experience is a big plus.

Required Skills & Attributes 

  • Proficiency with at least one major ML framework (PyTorch, Transformers, vLLM, or SGLang). 
  • Deep understanding of transformer-based models in language and/or vision domains, with demonstrated experience implementing and optimizing them. 
  • Proven ability to translate research ideas into robust code: implementing new model variants, training strategies, and evaluation workflows end-to-end.
  • Strong foundation in performance optimization on specialized hardware (e.g., GPUs, TPUs, or HPC interconnects). 
  • Deep understanding of modern ML architectures and strong intuition for optimizing their performance, particularly for inference workloads using sparse attention, pruning/compression, and speculative decoding. 
  • Track record of owning problems end-to-end and autonomously acquiring whatever knowledge is needed to deliver results. 
  • Self-directed mindset with a demonstrated ability to identify and tackle the most impactful problems. 
  • Collaborative approach with humility, eagerness to help colleagues, and commitment to team success. 
  • Genuine passion for AI and a drive to push the limits of inference performance. 
  • Hybrid role in Toronto, ON, CA or Sunnyvale, CA, USA.
Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection  point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

  1. Build a breakthrough AI platform beyond the constraints of the GPU.
  2. Publish and open source their cutting-edge AI research.
  3. Work on one of the fastest AI supercomputers in the world.
  4. Enjoy job stability with startup vitality.
  5. Our simple, non-corporate work culture that respects individual beliefs.

Read our blog: Five Reasons to Join Cerebras in 2026.

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.

HQ

Cerebras Systems Sunnyvale, California, USA Office

1237 E Arques Ave, Sunnyvale, CA, United States, 94085

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account