Boson AI Logo

Boson AI

Network Engineer, AI/ML Infrastructure

Posted 15 Days Ago
In-Office
Santa Clara, CA
150K-250K Annually
Mid level
In-Office
Santa Clara, CA
150K-250K Annually
Mid level
Design, build, and optimize high-performance network infrastructure for AI/ML operations, focusing on InfiniBand and Ethernet fabrics, while ensuring peak performance and network security.
The summary above was generated by AI
About The Role

We're seeking an experienced Network Engineer to design, build, and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. You'll work at the cutting edge of network technology—managing InfiniBand and ultra-high-speed Ethernet fabrics that connect NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage, and hundreds of servers.

You'll be hands-on with the full lifecycle of our network infrastructure: planning, building, testing, deploying, and keeping everything running at peak performance. That means troubleshooting issues as they arise, monitoring network performance and throughput, developing automation to streamline operations, and working closely with HPC and ML teams to ensure they have the bandwidth they need. You'll also help us plan for future capacity and evaluate emerging network technologies as we scale to meet increasingly demanding workloads.

Responsibilities

  • Configure and maintain InfiniBand and high-speed Ethernet fabrics
  • Optimize network performance for RDMA, and GPU-to-GPU communication
  • Manage network switches (Mellanox, NVIDIA, Micas Networks)
  • Troubleshoot network bottlenecks and latency issues
  • Plan and execute network upgrades and expansions
  • Network security implementation (firewalls, VLANs, ACLs)
  • Collaborate on storage network optimizationInfrastructure monitoring

Minimum Qualifications

  • 4+ years of network engineering experience in production environments
  • Strong understanding of L2/L3 networking protocols (TCP/IP, BGP, OSPF, VLANs)
  • Hands-on experience with high-speed networking (100Gb+ Ethernet and InfiniBand)
  • Hands-on experience with network security (firewalls, ACLs, network segmentation)
  • Knowledge of HPC network topologies
  • Experience with InfiniBand fabrics including RDMA, RoCE, IPoIB
  • Strong troubleshooting and problem-solving skills

Preferred Qualifications

  • Experience in data center environments or AI/ML infrastructure
  • Hands-on experience with high-performance Ethernet switches (e.g., Broadcom Tomahawk), and latest InfiniBand switches (e.g., Nvidia/Mellanox)
  • Experience optimizing networks for GPU-to-GPU communication
  • Experience with open-source firewall solutions (OPNsense, pfSense, or similar)
  • Experience with network automation tools
  • Understanding of distributed storage networking (Ceph cluster networks)
  • Familiarity with network monitoring and observability tools (Prometheus, Grafana)
  • Knowledge of multi-site network connectivity and WAN optimization
  • Familiarity with cloud networking in at least one platform (AWS, GCP, or Azure) including VPC design, site-to-site VPN configuration, Direct Connect/ExpressRoute/Cloud Interconnect, hybrid cloud connectivity, and cloud-to-datacenter network integration

If you're a natural problem-solver with a passion for continuous learning, we'd love to hear from you.

Top Skills

AWS
Azure
Bgp
Ceph
Ethernet
GCP
Grafana
Infiniband
Mellanox
Nvidia
Ospf
Prometheus
Tcp/Ip
Vlans

Boson AI Santa Clara, California, USA Office

Santa Clara, CA , United States, 95054

Similar Jobs

7 Minutes Ago
In-Office
San Francisco, CA, USA
118K-189K Annually
Junior
118K-189K Annually
Junior
Cloud • Fintech • Food • Information Technology • Software • Hospitality
The Flex Growth Account Executive drives revenue growth by managing upsell opportunities with existing SMB customers, utilizing consultative selling and collaboration across teams.
Top Skills: Salesforce CRM
26 Minutes Ago
Hybrid
4 Locations
90K-150K Annually
Mid level
90K-150K Annually
Mid level
Artificial Intelligence • Big Data • Healthtech • Machine Learning • Analytics • Biotech • Generative AI
The Scientist II will carry out complex research projects, executing data analyses and communicating scientific findings for biopharmaceutical clients using the Tempus data platform.
Top Skills: AWSCSS3D3DaskDockerFlaskGgplotGitHTML5JavaScriptJupyter NotebooksMatplotlibNumpyPandasPlot.LyRRstudioScikit-LearnScipySeabornTidyverse
26 Minutes Ago
Hybrid
4 Locations
75K-120K Annually
Internship
75K-120K Annually
Internship
Artificial Intelligence • Big Data • Healthtech • Machine Learning • Analytics • Biotech • Generative AI
The Data Scientist I will engage with clients to solve scientific problems, guide them through platform usage, and enhance client experience through documentation and training.
Top Skills: AWSCSS3D3DaskDockerFlaskGCPGgplotGitHTML5JavaScriptJupyter NotebooksMatplotlibNumpyPandasPlot.LyPythonRRstudioScikit-LearnScipySeabornSQLTidyverse

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account