Sciforium Logo

Sciforium

LLM Training Engineer

Reposted 15 Days Ago
In-Office
San Francisco, CA, USA
155K-220K Annually
Mid level
In-Office
San Francisco, CA, USA
155K-220K Annually
Mid level
As a Research Engineer, you'll work on pretraining, scaling, post-training, and deployment of AI models, improving infrastructure and performance.
The summary above was generated by AI

Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. Backed by multi-million-dollar funding and direct sponsorship from AMD with hands-on support from AMD engineers the team is scaling rapidly to build the full stack powering frontier AI models and real-time applications.

About the Role

As an LLM Training Engineer, you’ll work across the full foundation-model stack: pretraining and scaling, post-training and Reinforcement Learning, sandbox environments for evaluation and agentic learning, and deployment + inference optimization. You’ll build and iterate quickly on research ideas, contribute production-grade infrastructure, and help deliver models that can serve real-world use cases at scale.

What you’ll work on

This role spans multiple tracks - candidates may focus on one or contribute across several. Examples include:

Pretraining & Scaling

  • Train large byte-native foundation models across massive, heterogeneous corpora

  • Design stable training recipes and scaling laws for novel architectures

  • Improve throughput, memory efficiency, and utilization on large GPU clusters

  • Build and maintain distributed training infrastructure and fault-tolerant pipelines

Post-training & RL

  • Develop post-training pipelines (SFT, preference optimization, RLHF/RLAIF, RL)

  • Curate and generate targeted datasets to improve specific model capabilities

  • Build reward models and evaluation frameworks to drive iterative improvement

  • Explore inference-time learning and compute techniques to enhance performance

Sandbox Environments & Evaluation

  • Build scalable sandbox environments for agent evaluation and learning

  • Create realistic, high-signal automated evals for reasoning, tool use, and safety

  • Design offline + online environments that support RL-style training at scale

  • Instrument environments for observability, reproducibility, and iteration speed

Deployment & Inference Optimization

  • Optimize inference throughput/latency for byte-native architectures

  • Build high-performance serving pipelines (KV caching, batching, quantization, etc.)

  • Improve end-to-end model efficiency, cost, and reliability in production

  • Profile and optimize GPU kernels, runtime bottlenecks, and memory behavior

Ideal candidate credentials

Technical strength

  • Strong general software engineering skills (writing robust, performant systems)

  • Experience with training or serving large neural networks (LLMs or similar)

  • Solid grasp of deep learning fundamentals and modern literature

  • Comfort working in high-performance environments (GPU, distributed systems, etc.)

Relevant experience (one or more)

  • Pretraining / large-scale distributed training (FSDP/ZeRO/Megatron-style systems)

  • Post-training pipelines (SFT, RLHF/RLAIF, preference optimization, eval loops)

  • Building RL environments, simulators, or agent frameworks

  • Inference optimization, model compression, quantization, kernel-level profiling

  • Building large ETL pipelines for internet-scale data ingestion and cleaning

  • Owning end-to-end production ML systems with monitoring and reliability

Research orientation

  • Ability to propose and evaluate research ideas quickly

  • Strong experimental hygiene: ablations, metrics, reproducibility, analysis

  • Bias toward building — you can turn ideas into working code and results

Education

  • MS or PhD in Computer Science, Machine Learning, AI, Mathematics, or related field

Benefits include
  • Medical, dental, and vision insurance

  • 401k plan

  • Daily lunch, snacks, and beverages

  • Flexible time off

  • Competitive salary and equity

Equal opportunity

Sciforium is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.

HQ

Sciforium San Francisco, California, USA Office

San Francisco, CA, United States

Sciforium Los Altos, California, USA Office

4401 El Camino Real, Los Altos, California, United States, 94022

Similar Jobs

5 Days Ago
In-Office
Santa Clara, CA, USA
124K-196K Annually
Entry level
124K-196K Annually
Entry level
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Optimize and profile large-scale LLM training on GPUs; implement production-quality software across the deep-learning stack; build MLPerf Training submissions and simulator workloads; automate workload analysis and optimization; influence future GPU hardware and software design.
Top Skills: C++CudaGpusJaxMlperf TrainingNvidia SimulatorsPythonPyTorch
9 Days Ago
In-Office or Remote
7 Locations
Senior level
Senior level
Agency • Artificial Intelligence • Blockchain • Web3
Design, orchestrate, and optimize large-scale LLM pre-training across 1,000+ GPUs. Implement 3D parallelism, manage GPU clusters (SLURM/Kubernetes), optimize InfiniBand/RDMA networking and memory, and automate checkpointing and failure recovery for long training runs.
Top Skills: 3D ParallelismC++CudaDeepspeedGpuInfinibandKubernetesMegatron-LmPythonPyTorchRdmaSlurm
10 Days Ago
In-Office
Mountain View, CA, USA
150K-230K Annually
Senior level
150K-230K Annually
Senior level
Information Technology • Software
Lead post-training for LLMs (CPT, SFT, RL) focusing on RL methods; design and curate training and reward data; run and debug distributed training on mid-to-large GPU clusters; build evaluation and verifier pipelines; partner with product and business teams to deliver targeted model improvements quickly.
Top Skills: AttentionChat TemplatesDeepspeedDpoFsdpGrpoHugging Face AccelerateHugging Face TrlPpoPyTorchRlhfTokenizationVllm

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account