perf: optimize qwen3.5 hybrid linear cache flow[4/N]. by JC-ut0 · Pull Request #1160 · jd-opensource/xllm

JC-ut0 · 2026-04-01T09:16:09Z

Add logic to AclGraph to correctly identify valid KV caches in mixed-layer models, and refactor WorkerImpl to selectively allocate specific cache tensors (conv/ssm vs. key/value) per layer.

gemini-code-assist

Code Review

This pull request introduces support for hybrid attention models (such as qwen3_next) by differentiating between full attention and linear (GDN) attention layers during KV cache estimation and allocation. Key changes include updating LLMEngine and RecEngine to calculate cache capacity based on specific layer types, adding logic to AclGraph to correctly identify valid KV caches in mixed-layer models, and refactoring WorkerImpl to selectively allocate specific cache tensors (conv/ssm vs. key/value) per layer. Review feedback highlights the need for better consistency across the engine by utilizing the centralized is_full_attention_layer helper function to avoid logic errors related to default attention intervals and potential division-by-zero issues.

xllm/core/distributed_runtime/llm_engine.cpp

xllm/core/distributed_runtime/rec_engine.cpp

xllm/core/runtime/worker_impl.cpp

Removed unused layer types variable from worker_impl.cpp

xllm/core/distributed_runtime/llm_engine.cpp

RobbieLeung · 2026-04-03T08:59:33Z

BTW, the KV cache initialization needs to be split out into a separate function. It's too complex right now.

JC-ut0 · 2026-04-03T09:16:47Z

BTW, the KV cache initialization needs to be split out into a separate function. It's too complex right now.

Sure，this kv cache initialization will be refactored in the next PR.

yingxudeng · 2026-04-03T10:34:00Z

yq33victor

LGTM

JC-ut0 requested review from DongheJin, JimHsiung, RobbieLeung, XuZhang99, liutongxuan, walsonyang and yq33victor as code owners April 1, 2026 09:16

gemini-code-assist bot reviewed Apr 1, 2026

View reviewed changes

xllm/core/distributed_runtime/llm_engine.cpp Show resolved Hide resolved

xllm/core/distributed_runtime/rec_engine.cpp Outdated Show resolved Hide resolved

xllm/core/runtime/worker_impl.cpp Show resolved Hide resolved

perf: optimize qwen3.5 hybrid linear cache flow[4/N].

b403a03

JC-ut0 force-pushed the gdn_cache_fix branch from 9494758 to b403a03 Compare April 2, 2026 04:22

yingxudeng reviewed Apr 2, 2026

View reviewed changes

xllm/core/runtime/worker_impl.cpp Show resolved Hide resolved

yingxudeng reviewed Apr 2, 2026

View reviewed changes

xllm/core/runtime/worker_impl.cpp Outdated Show resolved Hide resolved

Remove unused layer types variable

75a926c

Removed unused layer types variable from worker_impl.cpp

yingxudeng approved these changes Apr 2, 2026

View reviewed changes

Kang-Meng approved these changes Apr 3, 2026

View reviewed changes

RobbieLeung reviewed Apr 3, 2026

View reviewed changes

xllm/core/distributed_runtime/llm_engine.cpp Show resolved Hide resolved

yingxudeng mentioned this pull request Apr 3, 2026

bugfix: fix qwen3.5 gated delta net conv state indices for acl graph[5/N]. #1171

Merged

yq33victor approved these changes Apr 5, 2026

View reviewed changes

yingxudeng approved these changes Apr 5, 2026

View reviewed changes

yingxudeng merged commit 34c9a59 into jd-opensource:main Apr 5, 2026
25 of 69 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize qwen3.5 hybrid linear cache flow[4/N].#1160

perf: optimize qwen3.5 hybrid linear cache flow[4/N].#1160
yingxudeng merged 2 commits intojd-opensource:mainfrom
JC-ut0:gdn_cache_fix

JC-ut0 commented Apr 1, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RobbieLeung commented Apr 3, 2026

Uh oh!

JC-ut0 commented Apr 3, 2026

Uh oh!

yingxudeng commented Apr 3, 2026

Uh oh!

yq33victor left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

JC-ut0 commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RobbieLeung commented Apr 3, 2026

Uh oh!

JC-ut0 commented Apr 3, 2026

Uh oh!

yingxudeng commented Apr 3, 2026

Uh oh!

yq33victor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

JC-ut0 commented Apr 1, 2026 •

edited

Loading