Prime Intellect Logo

Prime Intellect

Research Engineer - RL Infrastructure

Reposted 8 Days Ago
In-Office or Remote
Hiring Remotely in San Francisco, CA, USA
Mid level
In-Office or Remote
Hiring Remotely in San Francisco, CA, USA
Mid level
The Research Engineer will optimize large-scale RL training systems by enhancing performance in networking, memory, and computation. Responsibilities include designing low-level optimizations, collaborating with various teams to improve infrastructure, and contributing to open-source projects.
The summary above was generated by AI
Building Open Superintelligence Infrastructure

Prime Intellect is building the open superintelligence stack: from frontier agentic models to the infrastructure that enables anyone to train, adapt, and deploy them.

We unify globally distributed compute into a single control plane and pair it with the full reinforcement learning post-training stack: environments, secure sandboxes, verifiable evaluations, and our async RL trainer. We enable researchers, startups, and enterprises to run end-to-end RL at frontier scale, adapting models to real tools, workflows, and deployment environments.

We are looking for a Research Engineer to work on the systems layer behind large-scale RL training. This role is for someone who enjoys going deep on performance: optimizing kernels, improving memory and communication efficiency, scaling distributed workloads, and pushing the throughput and reliability of training systems closer to hardware limits.

If you care about making large-scale model training faster, cheaper, and more robust, we’d love to talk.

What You’ll Work On
  • Build and optimize the systems infrastructure behind large-scale RL and distributed training workloads.

  • Improve end-to-end training efficiency across compute, memory, networking, and scheduling layers.

  • Design and implement low-level performance optimizations, including kernels, communication paths, and runtime improvements.

  • Work on distributed training systems spanning data, tensor, and pipeline parallel workloads.

  • Help shape the architecture of our RL training stack, including async rollout and post-training systems.

  • Contribute to open-source libraries and internal infrastructure used for frontier-scale model training.

  • Collaborate closely with researchers and infrastructure engineers to translate bottlenecks into concrete systems improvements.

  • Stay at the frontier of training systems, inference systems, compiler/runtime tooling, and hardware-aware optimization techniques.

You May Be a Fit If You Have
  • Strong systems engineering experience in AI/ML infrastructure, especially around large-scale model training or inference.

  • Deep familiarity with PyTorch and distributed training frameworks such as PyTorch Distributed, DeepSpeed, FSDP, Megatron, vLLM, Ray, or related tooling.

  • Experience optimizing training performance across kernels, memory movement, communication overhead, or parallelization strategy.

  • Hands-on experience with large-scale training techniques including data parallelism, tensor parallelism, and pipeline parallelism.

  • Strong understanding of GPU architecture, profiling, and performance debugging.

  • Ability to identify bottlenecks across the stack and drive improvements from first principles.

  • Comfort working in a fast-moving environment with ambiguous problems and high ownership.

Especially Exciting
  • Experience writing or optimizing CUDA / Triton kernels.

  • Experience with compiler or runtime optimization for ML systems.

  • Experience working on RL training infrastructure, rollout systems, or asynchronous training pipelines.

  • Experience with multi-node GPU clusters and high-performance networking.

  • Contributions to open-source ML systems or infrastructure projects.

  • Interest in publishing technical work or sharing insights through engineering blogs and technical writing.

Why This Role Matters

The next frontier in AI will not be unlocked by models alone. It will be unlocked by systems that let those models train faster, adapt continuously, and operate across real environments at scale.

That infrastructure does not exist yet in the form the world needs.

We’re building it.

Benefits & Perks
  • Cash Compensation Range of $150-300k, plus equity.

  • Flexible work arrangements, with the option to work remotely or in person from our San Francisco office.

  • Visa sponsorship and relocation support for international candidates.

  • Quarterly team offsites, hackathons, conferences, and learning opportunities.

  • A deeply technical, high-agency team working on infrastructure for open superintelligence.

If you’re excited about building the systems foundation for frontier-scale RL and open superintelligence, we’d love to hear from you.

HQ

Prime Intellect San Francisco, California, USA Office

San Francisco, CA, United States

Similar Jobs

13 Minutes Ago
Easy Apply
Remote or Hybrid
United States
Easy Apply
119K-160K Annually
Mid level
119K-160K Annually
Mid level
Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Provide end-to-end commercial litigation support, advise on subpoenas and customer data privacy, manage eDiscovery lifecycle with automation/AI, mitigate and resolve disputes, drive process and technology-enabled innovation, and deliver actionable legal insights to cross-functional stakeholders.
Top Skills: AIEdiscoveryInternet Of Things (Iot)Tofu
18 Minutes Ago
Easy Apply
Remote
USA
Easy Apply
244K-287K Annually
Expert/Leader
244K-287K Annually
Expert/Leader
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Lead product vision and multi-year strategy for developer infrastructure across the code lifecycle. Own roadmap for CI/CD, release automation, testing, deployments, and production readiness; drive migrations to simplify systems, measure quality with scorecards, partner with Engineering/SRE/Security, and integrate emerging (AI) capabilities to improve developer velocity and reliability.
Top Skills: Ai-Powered TestingBuild SystemsCi/CdDeployment PipelinesDora MetricsGenerative AiRelease AutomationSecuritySreTesting Infrastructure
An Hour Ago
Remote
United States
253K-275K Annually
Senior level
253K-275K Annually
Senior level
Blockchain • Software • Cryptocurrency • Web3
Design, build, test, and deploy smart contracts and decentralized applications. Maintain blockchain integrations and backend services, optimize for security and gas efficiency, contribute to architecture and technical strategy, conduct code reviews, mentor junior engineers, and collaborate with product, frontend, and security teams.
Top Skills: AnchorAvalancheBnb ChainCi/CdCloud InfrastructureDaosDatabasesDefiEthereumEthers.JsFoundryGitGoHardhatNftsNode.jsPolygonPythonRustSmart ContractsSolanaSolidityTruffleTypescriptWallet IntegrationsWeb3.Js

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account