Tencent Logo

Tencent

Sr. AI Inference Systems Engineer

Reposted 9 Days Ago
Be an Early Applicant
In-Office
Palo Alto, CA, USA
120K-226K Annually
Senior level
In-Office
Palo Alto, CA, USA
120K-226K Annually
Senior level
Lead optimization of inference pipelines for large models, conduct research on hardware accelerators, and design high-performance inference frameworks. Mentor teams and drive technological innovation in AI inference optimization.
The summary above was generated by AI
Business UnitWhat the Role Entails
  • End-to-End Inference Optimization: Lead the optimization of the full inference pipeline for Large Models (LLM, Multimodal); focus on KV Cache storage strategies, Router architecture design, and collaborative operator optimization to maximize throughput and minimize latency.

  • Heterogeneous Computing Research: Conduct in-depth research into the underlying inference logic of various hardware accelerators; evaluate architectural suitability for real-time, batch, and streaming inference scenarios to develop standardized optimization schemes.

  • Inference Framework & Toolchain: Design and implement high-performance inference frameworks; optimize scheduling and memory management to resolve long-tail issues such as communication latency and load imbalance in distributed inference.

  • Technological Innovation: Track global advancements in inference technology (e.g., compiler optimization, model compression, and hardware fusion); drive the productization of emerging technologies within production environments.

  • Technical Leadership: Lead efforts to overcome key technical bottlenecks in inference optimization; design technical roadmaps and mentor team members to build a robust AI inference technical ecosystem.

Who We Look For
  • Education & Experience: Master’s or Ph.D. in Computer Science, Electronic Engineering, AI, or related fields; significant professional experience in AI inference optimization or heterogeneous computing.

  • Hardware Expertise: Proficient in at least one AI accelerator architecture; deep understanding of underlying principles, instruction sets, and hardware-specific tuning.

  • Inference Specialization: Mastery of core inference optimization techniques, including multi-level KV Cache management, Quantization, and Intelligent Routing.

  • Systems Proficiency: Expert in parallel computing and distributed systems; deep understanding of low-level programming models (e.g., CUDA, Triton) and inference engine architectures.

  • Frameworks & Models: Familiar with mainstream deep learning frameworks (e.g., PyTorch, TensorFlow); experience in optimizing ultra-large-scale models is highly preferred.

  • Industry Insight: Stay current with global evolutions in inference technology and computing architectures, with the ability to objectively evaluate different technical paths.

  • Professional Skills: Strong analytical and cross-team collaboration skills, with a proven track record of leading complex inference projects to fruition.

  • Preferred Qualifications: Experience in tuning ultra-large-scale inference clusters or driving AI inference productization; high-level publications or core patents in relevant fields are a plus.

Location State(s)

US-California-Palo Alto

The expected base pay range for this position in the location(s) listed above is $120,100.00 to $225,700.00 per year. Actual pay may vary depending on job-related knowledge, skills, and experience. Employees hired for this position may be eligible for a sign on payment, relocation package, and restricted stock units, which will be evaluated on a case-by-case basis. Subject to the terms and conditions of the plans in effect, hired applicants are also eligible for medical, dental, vision, life and disability benefits, and participation in the Company’s 401(k) plan. The Employee is also eligible for up to 15 to 25 days of vacation per year (depending on the employee’s tenure), up to 13 days of holidays throughout the calendar year, and up to 10 days of paid sick leave per year. Your benefits may be adjusted to reflect your location, employment status, duration of employment with the company, and position level. Benefits may also be pro-rated for those who start working during the calendar year.Equal Employment Opportunity at Tencent

As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.

HQ

Tencent Palo Alto, California, USA Office

2747 Park Blvd, Palo Alto, CA, United States, 94306

Similar Jobs

13 Days Ago
In-Office
San Francisco, CA, USA
167K-209K Annually
Senior level
167K-209K Annually
Senior level
Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
Join DigitalOcean as a Senior Engineer to design and develop high-scale AI data plane services, optimizing performance and mentoring junior engineers.
Top Skills: GoGrpcNvidia DynamoPythonRay ServeSglangVllm
5 Days Ago
In-Office
Santa Clara, CA, USA
184K-357K Annually
Senior level
184K-357K Annually
Senior level
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
The role involves architecting and optimizing AI inference systems, developing GPU kernels, and contributing to benchmark methodologies, requiring substantial experience in performance engineering and various programming technologies.
Top Skills: AWSAzureC/C++CudaDockerGCPGoKubernetesPythonRustSlurm
An Hour Ago
Hybrid
San Francisco, CA, USA
99K-232K Annually
Mid level
99K-232K Annually
Mid level
Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Lead client engagements to optimize supply chain planning using Kinaxis and analytics. Manage projects, mentor staff, design inventory and distribution strategies, implement SCM technology, and ensure performance and compliance.
Top Skills: Data AnalyticsKinaxisSupply Chain Management Software

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

  • Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Google, Apple, Salesforce, Meta
  • Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
  • Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
  • Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account