IntelliPro Group Inc. Logo

IntelliPro Group Inc.

Research Scientist / Engineer – Training Infrastructure

Reposted 14 Days Ago
Be an Early Applicant
In-Office
Palo Alto, CA
220K-300K Annually
Mid level
In-Office
Palo Alto, CA
220K-300K Annually
Mid level
The role involves designing and optimizing distributed training systems for multimodal foundation models, utilizing thousands of GPUs, while implementing advanced parallelization techniques and debugging tools.
The summary above was generated by AI
Job Title: Research Scientist / Engineer – Training Infrastructure
Position Type: Full time
Location: Palo Alto, CA • Remote - US • Remote - International
Salary Range: $220,000 - $300, 000 (USD)
Job ID#: 154559
Job Description:

We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step function change will come from vision. So, we are working on training and scaling up multimodal foundation models for systems that can see and understand, show and explain, and eventually interact with our world to effect change. We are looking for engineers with significant experience solving hard problems in PyTorch, CUDA and distributed systems. You will work alongside the rest of the research team to build & train cutting edge foundation models on thousands of GPUs that are built to scale from the ground up.

Responsibilities
  • Design, implement, and optimize efficient distributed training systems for models with thousands of GPUs
  • Research and implement advanced parallelization techniques (FSDP, Tensor Parallel, Pipeline Parallel, Expert Parallel)
  • Build monitoring, visualization, and debugging tools for large-scale training runs
  • Optimize training stability, convergence, and resource utilization across massive clusters
Requirements:
  • Extensive experience with distributed PyTorch training and parallelisms in foundation model training
  • Deep understanding of GPU clusters, networking, and storage systems
  • Familiarity with communication libraries (NCCL, MPI) and distributed system optimization
  • (Preferred) Strong Linux systems administration and scripting capabilities
  • (Preferred) Experience managing training runs across >100 GPUs
  • (Preferred) Experience with containerization, orchestration, and cloud infrastructure
About Us:
Founded in 2009, IntelliPro is a global leader in talent acquisition and HR solutions. Our commitment to delivering unparalleled service to clients, fostering employee growth, and building enduring partnerships sets us apart. We continue leading global talent solutions with a dynamic presence in over 160 countries, including the USA, China, Canada, Singapore, Japan, Philippines, UK, India, Netherlands, and the EU.
IntelliPro, a global leader connecting individuals with rewarding employment opportunities, is dedicated to understanding your career aspirations. As an Equal Opportunity Employer, IntelliPro values diversity and does not discriminate based on race, color, religion, sex, sexual orientation, gender identity, national origin, age, genetic information, disability, or any other legally protected group status. Moreover, our Inclusivity Commitment emphasizes embracing candidates of all abilities and ensures that our hiring and interview processes accommodate the needs of all applicants. Learn more about our commitment to diversity and inclusivity at https://intelliprogroup.com/.
Compensation: The pay offered to a successful candidate will be determined by various factors, including education, work experience, location, job responsibilities, certifications, and more. Additionally, IntelliPro provides a comprehensive benefits package, all subject to eligibility.

Top Skills

Cuda
Mpi
Nccl
PyTorch
HQ

IntelliPro Group Inc. Santa Clara, California, USA Office

3120 Scott Blvd, Ste 301, Santa Clara, CA, United States, 95054

Similar Jobs

16 Minutes Ago
Remote or Hybrid
California, USA
170K-190K Annually
Senior level
170K-190K Annually
Senior level
Information Technology • Productivity • Software • Infrastructure as a Service (IaaS)
The Director of Integrated Media leads NinjaOne's global media strategy, manages budgets, and ensures media investments meet growth goals. They collaborate with various marketing teams to drive measurable business impact.
Top Skills: B2B SaasMedia StrategyPaid SocialProgrammatic Marketing
16 Minutes Ago
Remote or Hybrid
17 Locations
100K-140K Annually
Mid level
100K-140K Annually
Mid level
Information Technology • Productivity • Software • Infrastructure as a Service (IaaS)
Drive demand generation and partner marketing for MSPs, executing regional marketing plans, managing campaigns, and ensuring alignment with sales and business teams.
Top Skills: AsanaPardotSalesforceSalesloftTableau
22 Minutes Ago
Hybrid
Visalia, CA, USA
32-48 Hourly
Mid level
32-48 Hourly
Mid level
Fintech • Financial Services
As a Branch Small Business Banker, you will manage relationships with small business clients, sell banking products, provide service, and ensure compliance with regulations.

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account