Tencent Jobs

Sr. AI Inference Systems Engineer

Tencent

Sr. AI Inference Systems Engineer

Reposted 9 Days Ago

Be an Early Applicant

In-Office

Palo Alto, CA, USA

120K-226K Annually

Senior level

In-Office

Palo Alto, CA, USA

120K-226K Annually

Senior level

Lead optimization of inference pipelines for large models, conduct research on hardware accelerators, and design high-performance inference frameworks. Mentor teams and drive technological innovation in AI inference optimization.

The summary above was generated by AI

Business UnitWhat the Role Entails

End-to-End Inference Optimization: Lead the optimization of the full inference pipeline for Large Models (LLM, Multimodal); focus on KV Cache storage strategies, Router architecture design, and collaborative operator optimization to maximize throughput and minimize latency.
Heterogeneous Computing Research: Conduct in-depth research into the underlying inference logic of various hardware accelerators; evaluate architectural suitability for real-time, batch, and streaming inference scenarios to develop standardized optimization schemes.
Inference Framework & Toolchain: Design and implement high-performance inference frameworks; optimize scheduling and memory management to resolve long-tail issues such as communication latency and load imbalance in distributed inference.
Technological Innovation: Track global advancements in inference technology (e.g., compiler optimization, model compression, and hardware fusion); drive the productization of emerging technologies within production environments.
Technical Leadership: Lead efforts to overcome key technical bottlenecks in inference optimization; design technical roadmaps and mentor team members to build a robust AI inference technical ecosystem.

Who We Look For

Education & Experience: Master’s or Ph.D. in Computer Science, Electronic Engineering, AI, or related fields; significant professional experience in AI inference optimization or heterogeneous computing.
Hardware Expertise: Proficient in at least one AI accelerator architecture; deep understanding of underlying principles, instruction sets, and hardware-specific tuning.
Inference Specialization: Mastery of core inference optimization techniques, including multi-level KV Cache management, Quantization, and Intelligent Routing.
Systems Proficiency: Expert in parallel computing and distributed systems; deep understanding of low-level programming models (e.g., CUDA, Triton) and inference engine architectures.
Frameworks & Models: Familiar with mainstream deep learning frameworks (e.g., PyTorch, TensorFlow); experience in optimizing ultra-large-scale models is highly preferred.
Industry Insight: Stay current with global evolutions in inference technology and computing architectures, with the ability to objectively evaluate different technical paths.
Professional Skills: Strong analytical and cross-team collaboration skills, with a proven track record of leading complex inference projects to fruition.
Preferred Qualifications: Experience in tuning ultra-large-scale inference clusters or driving AI inference productization; high-level publications or core patents in relevant fields are a plus.

Location State(s)

US-California-Palo Alto

The expected base pay range for this position in the location(s) listed above is $120,100.00 to $225,700.00 per year. Actual pay may vary depending on job-related knowledge, skills, and experience. Employees hired for this position may be eligible for a sign on payment, relocation package, and restricted stock units, which will be evaluated on a case-by-case basis. Subject to the terms and conditions of the plans in effect, hired applicants are also eligible for medical, dental, vision, life and disability benefits, and participation in the Company’s 401(k) plan. The Employee is also eligible for up to 15 to 25 days of vacation per year (depending on the employee’s tenure), up to 13 days of holidays throughout the calendar year, and up to 10 days of paid sick leave per year. Your benefits may be adjusted to reflect your location, employment status, duration of employment with the company, and position level. Benefits may also be pro-rated for those who start working during the calendar year.Equal Employment Opportunity at Tencent

As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.

2747 Park Blvd, Palo Alto, CA, United States, 94306

Similar Jobs

DigitalOcean

Senior Engineer

13 Days Ago

In-Office

San Francisco, CA, USA

167K-209K Annually

Senior level

167K-209K Annually

Senior level

Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)

Join DigitalOcean as a Senior Engineer to design and develop high-scale AI data plane services, optimizing performance and mentoring junior engineers.

Top Skills: GoGrpcNvidia DynamoPythonRay ServeSglangVllm

NVIDIA

Senior Software Engineer

5 Days Ago

In-Office

Santa Clara, CA, USA

184K-357K Annually

Senior level

184K-357K Annually

Senior level

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse

The role involves architecting and optimizing AI inference systems, developing GPU kernels, and contributing to benchmark methodologies, requiring substantial experience in performance engineering and various programming technologies.

Top Skills: AWSAzureC/C++CudaDockerGCPGoKubernetesPythonRustSlurm

PwC

Connected Supply Chain, Planning - Kinaxis, Manager

An Hour Ago

Hybrid

San Francisco, CA, USA

99K-232K Annually

Mid level

99K-232K Annually

Mid level

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI

Lead client engagements to optimize supply chain planning using Kinaxis and analytics. Manage projects, mentor staff, design inventory and distribution strategies, implement SCM technology, and ensure performance and compliance.

Top Skills: Data AnalyticsKinaxisSupply Chain Management Software

What you need to know about the San Francisco Tech Scene

San Francisco and the surrounding Bay Area attracts more startup funding than any other region in the world. Home to Stanford University and UC Berkeley, leading VC firms and several of the world’s most valuable companies, the Bay Area is the place to go for anyone looking to make it big in the tech industry. That said, San Francisco has a lot to offer beyond technology thanks to a thriving art and music scene, excellent food and a short drive to several of the country’s most beautiful recreational areas.

Key Facts About San Francisco Tech

Number of Tech Workers: 365,500; 13.9% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Google, Apple, Salesforce, Meta
Key Industries: Artificial intelligence, cloud computing, fintech, consumer technology, software
Funding Landscape: $50.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Sequoia Capital, Andreessen Horowitz, Bessemer Venture Partners, Greylock Partners, Khosla Ventures, Kleiner Perkins
Research Centers and Universities: Stanford University; University of California, Berkeley; University of San Francisco; Santa Clara University; Ames Research Center; Center for AI Safety; California Institute for Regenerative Medicine

Tencent

Sr. AI Inference Systems Engineer

Tencent Palo Alto, California, USA Office

Similar Jobs

Senior Engineer

Senior Software Engineer

Connected Supply Chain, Planning - Kinaxis, Manager

What you need to know about the San Francisco Tech Scene

Key Facts About San Francisco Tech