Graphcore Jobs

AI Performance Engineer

Graphcore

AI Performance Engineer

Reposted 16 Days Ago

Hybrid

Milpitas, CA, USA

Mid level

Hybrid

Milpitas, CA, USA

Mid level

The AI Performance Engineer optimizes performance for AI workloads on ARM architectures, analyzing compute requirements, improving distributed systems, and profiling AI workloads. Responsibilities include benchmarking and enhancing system communication stacks, collaborating with teams to ensure efficiency and reliability.

The summary above was generated by AI

About us

Graphcore is one of the world’s leading innovators in Artificial Intelligence compute.
It is developing hardware, software and systems infrastructure that will unlock the next generation of AI breakthroughs and power the widespread adoption of AI solutions across every industry.
As part of the SoftBank Group, Graphcore is a member of an elite family of companies responsible for some of the world’s most transformative technologies. Together, they share a bold vision: to enable Artificial Super Intelligence and ensure its benefits are accessible to everyone.
Graphcore’s teams are drawn from diverse backgrounds and bring a broad range of skills and perspectives. A melting pot of AI research specialists, silicon designers, software engineers and systems architects, Graphcore enjoys a culture of continuous learning and constant innovation.

Job Summary

Graphcore’s AI/ML training and inference infrastructure is rapidly scaling to meet the growing demands of AI workloads across mobile, edge, and datacenter environments. This role focuses on optimizing performance across ARM-based architectures and large-scale distributed systems, ensuring efficiency, scalability, and reliability across the full hardware-software stack.

The Team

The System Engineering Performance team architects and optimizes high-performance infrastructure for large-scale datacenter deployments. The team works across hardware, software, networking, and system architecture to deliver cutting-edge AI solutions and ensure optimal system performance at scale.

Responsibilities and Duties

Analyze ML models’ compute and memory requirements using roofline analysis and simulations
Collaborate across hardware and software teams to optimize large-scale AI workloads
Benchmark, monitor, and troubleshoot system performance across distributed systems
Optimize communication stacks including MPI, NCCL, UCX, RDMA, and networking fabrics
Profile and optimize AI workloads, focusing on performance bottlenecks
Develop high-quality, ARM-compatible code and documentation

Candidate Profile

Essential:

BS/MS in Computer Science, Electrical Engineering, or related field
Experience with distributed systems and communication libraries (MPI, NCCL, UCX, libfabric)
Strong programming skills in C++ and Python
Experience profiling and optimizing HPC or AI/ML workloads
Familiarity with ML benchmarks such as MLPerf

Desirable:

Experience with GPUs or accelerated computing architectures
Knowledge of HPC networking and interconnect technologies (InfiniBand, RoCE)
Familiarity with ML frameworks such as PyTorch or TensorFlow
Understanding of ARM architectures and toolchains
Strong debugging, profiling, and performance optimization skills

Similar Jobs at Graphcore

Graphcore

Software Engineer

An Hour Ago

Hybrid

Milpitas, CA, USA

Mid level

Artificial Intelligence • Semiconductor

The Principal Test Framework Software Engineer will design and implement test automation frameworks for hardware validation, collaborating with teams to ensure quality and efficiency in AI hardware systems.

Top Skills: BashCdCiGitLinuxPythonUnix

Graphcore

Scientist

An Hour Ago

Remote or Hybrid

Milpitas, CA, USA

Mid level

Artificial Intelligence • Semiconductor

The Principal Reliability Scientist leads reliability activities in high-performance systems, drives experimental design, analyses data for reliability metrics, and collaborates with cross-functional teams to enhance product reliability and serviceability.

Top Skills: Ai HardwareExperimental DesignReliability EngineeringReliability ModellingStatistical Data AnalysisSystem Reliability

Graphcore

Principal Hardware Diagnostics Engineer

An Hour Ago

Hybrid

Milpitas, CA, USA

Mid level

Artificial Intelligence • Semiconductor

Design, develop, and implement diagnostics software and tools for hardware monitoring and fault diagnosis in AI systems. Collaborate with multiple teams to ensure system reliability.

Top Skills: C#C++LinuxPython

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine