Saviynt Jobs

AI Platform Engineer, Training and Inference

Saviynt

AI Platform Engineer, Training and Inference

Reposted 12 Days Ago

Hybrid

Milpitas, CA, USA

Mid level

Hybrid

Milpitas, CA, USA

Mid level

The AI Platform Engineer will own distributed training and inference operations using Ray on GPUs, manage the lifecycle of machine learning models, optimize performance, and integrate reinforcement learning. Key responsibilities include overseeing the Ray ecosystem, configuring training and inference systems, and ensuring quality and performance metrics are met throughout the model lifecycle.

The summary above was generated by AI

AI Platform Engineer – Training & Inference

Saviynt's AI-powered identity platform manages and governs human and non-human access to all of an organization's applications, data, and business processes. Customers trust Saviynt to safeguard their digital assets, drive operational efficiency, and reduce compliance costs. Built for the AI age, Saviynt is today helping organizations safely accelerate their deployment and usage of AI. Saviynt is recognized as the leader in identity security, with solutions that protect and empower the world's leading brands, Fortune 500 companies and government institutions. For more information, please visit www.saviynt.com.

The AI Platform team is building the compute layer that trains, evaluates, and serves every AI model at Saviynt. We need an ML Platform Engineer to own distributed training on Ray + H100s, the multi-engine LLM inference mesh (vLLM, SGLang, NVIDIA Triton), and the full model promotion lifecycle — from shadow mode through canary rollout to GA.

The AI Platform team's mission is to build a secure, scalable, product-agnostic AI foundation that enables Saviynt's identity products to deliver measurable AI-powered outcomes. Training & Inference is the engine — it turns data into deployed models that make Saviynt's products smarter.

What You Will Be Doing

• Own the Ray ecosystem end-to-end: manage KubeRay on GKE, tune Ray Core Task/Actor scheduling, operate the Plasma distributed object store, and configure Ray Data for GPU-direct streaming from GCS/S3
• Operate distributed training with Ray Train: configure TorchTrainer + DDP/NCCL for multi-node H100 clusters, manage checkpoint lifecycle, implement spot-preemption recovery, and integrate warm-start fine-tuning for retrain pipelines
• Build and operate the LLM inference mesh with Ray Serve: compose vLLM (PagedAttention), SGLang (RadixAttention), and NVIDIA Triton (TensorRT/ONNX) as a unified deployment graph with Plasma zero-copy memory sharing
• Optimise inference performance: configure fractional GPU allocation, enable continuous batching, implement per-engine autoscaling based on request queue depth, and tune KV-cache block sizes
• Design and operate the model routing layer: capability-based, version-based, and tenant-based routing with cost-aware fallback between self-hosted SLMs and cloud LLMs
• Build RL training infrastructure: define Flyte workflows for RL pipelines (rollout, reward shaping, policy update, evaluation), integrate Ray RLlib or custom PPO/GRPO loops with Ray Train, and manage replay buffer persistence on GCS

• Operate the full model promotion lifecycle: quality gate → integration tests → load tests (k6) → shadow mode → A/B gate → canary (10%→100%) with golden-signal auto-rollback

• Operate the retrain pipeline: drift detection triggers, warm-start retraining, relative quality gates (V2 >= V1 − 2%), and automated Flyte DAG through to canary
• Integrate RAG retrieval into the inference mesh: vector similarity search, context assembly, and prompt construction before LLM inference

What You Bring

• Experience in ML engineering with time in an ML platform or MLOps role
• Production Ray depth: Ray Train, Serve, Core, and Data — debugged real production failures including NCCL timeouts, Plasma OOM, and Serve autoscaling lag
• LLM serving engines: hands-on with vLLM, SGLang, or NVIDIA Triton — PagedAttention, prefix caching, and continuous batching tuned for latency/throughput targets
• Distributed training: DDP, FSDP, NCCL collectives, gradient checkpointing, and mixed precision (BF16/FP8)
• RL working knowledge: PPO, policy gradient, or RLHF — able to translate an algorithm into distributed compute primitives

• Model lifecycle operations: MLflow registry, shadow/A/B/canary patterns, and auto-
rollback on golden signal degradation

• Vector databases: Pgvector or Qdrant — ANN index strategies, embedding upsert, and query latency tuning under inference load
• Strong Python and PyTorch; Flyte or equivalent ML orchestrator
• Quantization (nice to have): INT8/INT4/FP8 post-training quantization (GPTQ, AWQ, or bitsandbytes)
• Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent
practical experience or equivalent military experience

We offer you a competitive total rewards package, learning and tremendous opportunities to grow and advance in your career. At Saviynt, it is not typical for an individual to be hired at or near the top of the range for their role and final compensation decisions are dependent on many factors including, but not limited to location; skill sets; experience and training; licensure and certifications; and other relevant business and organizational needs.

You may also be eligible to participate in a Saviynt discretionary bonus plan, subject to the rules governing the program, whereby an award, if any, depends on various factors, including, without limitation, individual and organizational performance.

Similar Jobs

Enverus

Owner Relations Agent - 26237

24 Minutes Ago

In-Office or Remote

United States

Mid level

Big Data • Information Technology • Software • Analytics • Energy

Answer owner relations calls about revenue, land, division orders, JIB, A/R, and A&P. Log and track inquiries in a case system, follow up on unresolved issues, build client relationships, handle difficult interactions professionally, and cross-train to expand skills.

Top Skills: MS Office

MetLife

Consultant

2 Hours Ago

Remote or Hybrid

United States

90K-105K Annually

Senior level

90K-105K Annually

Senior level

Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics

Manage large group insurance client relationships with a focus on reporting and metrics. Serve as primary liaison, deliver client reports and insights, lead projects and implementations, drive strategic initiatives, mentor junior staff, and ensure accurate system data and documentation.

Top Skills: ExcelMS OfficeMicrosoft Powerpoint

MetLife

Business Analyst

2 Hours Ago

Remote or Hybrid

United States

55K-55K Annually

Mid level

55K-55K Annually

Mid level

Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics

Partner with U100 sales to drive small-business growth by delivering analysis, reports, dashboards, and strategic sales support. Recommend process improvements, correct operational errors, lead projects, train sales on platforms, and support renewals and financial/contract evaluations.

Top Skills: CopilotMS OfficeSalesforce

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine