Pluralis Research Logo

Pluralis Research

Machine Learning Engineer - Distributed ML Systems

Reposted 12 Days Ago
In-Office or Remote
Hiring Remotely in San Francisco, CA, USA
Senior level
In-Office or Remote
Hiring Remotely in San Francisco, CA, USA
Senior level
As a Machine Learning Engineer, you'll design and implement distributed systems for training large-scale ML models while optimizing performance under challenging conditions.
The summary above was generated by AI
Overview

Pluralis Research carries out foundational research on Protocol Learning: multi-participant training of foundation models where no single participant has, or can ever obtain, a full copy of the model. The purpose of Protocol Learning is to facilitate the creation of community-trained and community-owned frontier models with self-sustaining economics.

We're looking for Senior/Staff engineers with 5+ years of experience in distributed systems and ML large-scale training. You'll be implementing a novel substrate for training distributed ML models that work under consumer grade internet connection.

Responsibilities

Distributed Training Architecture & Optimization
  • Design and implement large-scale distributed training systems optimized for heterogeneous hardware operating under low-bandwidth, high-latency conditions.

  • Develop and optimize model-parallel training strategies (data, tensor, pipeline parallelism) with custom sharding techniques that minimize communication overhead.

  • Optimize GPU utilization, memory efficiency, and compute performance across distributed nodes.

  • Implement robust checkpointing, state synchronization, and recovery mechanisms for long-running, fault-prone training jobs.

  • Build monitoring and metrics systems to track training progress, model quality, and system bottlenecks.

Decentralized Networking & Resilience
  • Architect resilient training systems where nodes can fail, networks can partition, and participants can dynamically join or leave.

  • Design and optimize peer-to-peer topologies for decentralized coordination across non-co-located nodes.

  • Implement NAT traversal, peer discovery, dynamic routing, and connection lifecycle management.

  • Profile and optimize communication patterns to reduce latency and bandwidth overhead in multi-participant environments.

What You’ll Bring
  • Strong experience building and operating distributed systems in production.

  • Hands-on expertise with distributed training frameworks (FSDP, DeepSpeed, Megatron, or similar).

  • Deep understanding of model parallelism (data, tensor, pipeline parallelism).

  • Expert-level Python with production experience (concurrency, error handling, retry logic, clean architecture).

  • Strong networking fundamentals: P2P systems, gRPC, routing, NAT traversal, distributed coordination.

  • Experience optimizing GPU workloads, memory management, and large-scale compute efficiency.

What We Offer
  • Equity-heavy compensation with meaningful ownership in a mission-driven company

  • Competitive base salary for senior engineering roles in Australia

  • Visa sponsorship available for exceptional candidates

  • Remote-first with optional access to our Melbourne hub

  • World-class team — team mates were previously at at Google, Amazon, Microsoft, and leading startups

Backed by Union Square Ventures and other tier-1 investors, we're a world-class, deeply technical team of ML researchers and engineers. Pluralis is unapologetically ideological. We view the world as a better place if we are able to implement what we are attempting, and Protocol Learning as the only plausible approach to preventing a handful of massive corporations monopolising model development, access and release, and achieving massive economic capture. If this resonates, please apply.

Similar Jobs

17 Days Ago
In-Office or Remote
San Francisco, CA, USA
227K-417K Annually
Senior level
227K-417K Annually
Senior level
News + Entertainment
The role involves designing scalable distributed systems using Scala, enhancing feature stores for performance, and collaborating with ML engineers to resolve challenges.
Top Skills: AWSCassandraDockerJavaKafkaKubernetesNoSQLPostgresScalaSQL
46 Minutes Ago
Easy Apply
Remote
United States
Easy Apply
68K-81K Annually
Mid level
68K-81K Annually
Mid level
Insurance
Provide advanced end-user technical support and act as an escalation point. Administer IT infrastructure (user accounts, access control), onboard new hires, manage assets, document processes, train users and junior staff, lead small-medium IT projects, and improve IT support processes across macOS and Windows environments.
Top Skills: Google WorkspaceInfrastructure As CodemacOSNo-Code Automation ToolsOktaScripting LanguagesWindows
47 Minutes Ago
Easy Apply
Remote
United States
Easy Apply
Senior level
Senior level
Enterprise Web • Mobile • Professional Services • Software
Lead Dscout's global renewals function and Renewal Manager team. Own renewals pipeline, forecast retention and churn, develop proactive renewal and churn-prevention strategies, collaborate with Sales and Customer Success, and continuously improve processes and KPIs to maximize customer retention and revenue.
Top Skills: Salesforce

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account