Graphcore Logo

Graphcore

Staff Systems Engineer

Posted 2 Days Ago
Be an Early Applicant
Hybrid
Milpitas, CA, USA
Mid level
Hybrid
Milpitas, CA, USA
Mid level
The Staff Systems Engineer will troubleshoot and support Graphcore's AI hardware platforms, ensuring performance and reliability through collaboration and validation processes.
The summary above was generated by AI
About us

Graphcore is one of the world’s leading innovators in Artificial Intelligence compute. It is developing hardware, software and systems infrastructure that will unlock the next generation of AI breakthroughs and power the widespread adoption of AI solutions across every industry.

As part of the SoftBank Group, Graphcore is a member of an elite family of companies responsible for some of the world’s most transformative technologies. Together, they share a bold vision: to enable Artificial Super Intelligence and ensure its benefits are accessible to everyone.

Graphcore’s teams are drawn from diverse backgrounds and bring a broad range of skills and perspectives. A melting pot of AI research specialists, silicon designers, software engineers and systems architects, Graphcore enjoys a culture of continuous learning and constant innovation.


Job Summary

We are seeking a Staff Hardware Engineer to provide advanced operational, diagnostic, and engineering support for Graphcore’s Arm-based hardware platforms across lab and data center environments.

This role focuses on supporting hardware bring-up, validation, and troubleshooting of complex AI compute platforms, including server blades, racks, and rack-scale infrastructure. The successful candidate will collaborate closely with engineering, platform, and data center teams to ensure the reliability and performance of next-generation AI systems.


The Team

The Systems Engineering and Hardware Engineering teams are responsible for enabling the bring-up, validation, and operational reliability of Graphcore’s AI infrastructure platforms.

The team works closely with server engineering, firmware teams, platform architects, and data center operations to support the development, testing, and deployment of next-generation AI compute systems.

This collaborative environment enables rapid problem-solving and continuous improvement of Graphcore’s hardware platforms from early development through production deployment.


Responsibilities and Duties
  • Lead advanced break-fix troubleshooting for server blades, motherboards, power systems, and rack-scale infrastructure.
  • Support engineering bring-up activities, including component validation and firmware interaction testing.
  • Diagnose system-level failures involving thermal behavior, power anomalies, network configuration, and BIOS/BMC issues.
  • Collaborate with server engineering teams to perform root cause analysis and propose corrective actions or design improvements.
  • Support deployment and rollout of next-generation hardware platforms through structured validation and qualification cycles.
  • Interface with facilities and infrastructure teams to understand environmental factors impacting system reliability.
  • Develop and maintain standard operating procedures (SOPs), troubleshooting guides, and validation documentation.
  • Provide guidance and mentorship to junior technicians and engineers on troubleshooting methodologies and hardware diagnostics.
  • Participate in on-call rotations or off-hours support during critical engineering milestones or hardware bring-up phases.
Candidate ProfileEssential
  • Bachelor’s degree in Electrical Engineering, Computer Engineering, Computer Science, or related discipline.
  • Strong experience with server hardware architectures and board-level debugging.
  • Experience analyzing system logs, hardware telemetry, and power/thermal metrics to isolate hardware failures.
  • Hands-on experience with HPC systems, AI compute platforms, or rack-scale infrastructure.
  • Strong collaboration skills and ability to work effectively in fast-paced engineering environments.
  • Excellent written and verbal communication skills.
Desirable
  • Experience supporting prototype or pre-production hardware bring-up.
  • Familiarity with data center facilities, including liquid cooling and power distribution systems.
  • Experience using Python, Bash, or automation tools for hardware validation or troubleshooting.
  • Exposure to structured failure analysis and reliability engineering methodologies.

Top Skills

Ai Compute Platforms
Bash
Hardware Architectures
Hpc Systems
Python
Rack-Scale Infrastructure

Similar Jobs at Graphcore

2 Days Ago
Hybrid
Milpitas, CA, USA
Senior level
Senior level
Artificial Intelligence • Semiconductor
This role involves advanced troubleshooting, validation, and operational support for AI compute hardware, collaborating with engineering teams to ensure system performance.
Top Skills: BashPython
2 Days Ago
Hybrid
Milpitas, CA, USA
Mid level
Mid level
Artificial Intelligence • Semiconductor
The Compliance Manager will develop and implement security frameworks for global lab environments, ensuring compliance, securing hardware, and managing access control.
Top Skills: Iso 27001NistSoc 2
4 Days Ago
Hybrid
Milpitas, CA, USA
Expert/Leader
Expert/Leader
Artificial Intelligence • Semiconductor
Design and optimize storage architectures for AI data centers, focusing on NVMe SSDs and ensuring high-performance data flow to GPUs. Responsibilities include performance tuning, vendor engagement, and managing storage subsystems for AI workloads.
Top Skills: BashExt4FioJSONLinuxNvme SsdsPciePythonXfsZfs

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account