Skip to content

perf: optimize qwen3.5 hybrid linear cache flow[4/N].#1160

Merged
yingxudeng merged 2 commits intojd-opensource:mainfrom
JC-ut0:gdn_cache_fix
Apr 5, 2026
Merged

perf: optimize qwen3.5 hybrid linear cache flow[4/N].#1160
yingxudeng merged 2 commits intojd-opensource:mainfrom
JC-ut0:gdn_cache_fix

Conversation

@JC-ut0
Copy link
Copy Markdown
Contributor

@JC-ut0 JC-ut0 commented Apr 1, 2026

Add logic to AclGraph to correctly identify valid KV caches in mixed-layer models, and refactor WorkerImpl to selectively allocate specific cache tensors (conv/ssm vs. key/value) per layer.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for hybrid attention models (such as qwen3_next) by differentiating between full attention and linear (GDN) attention layers during KV cache estimation and allocation. Key changes include updating LLMEngine and RecEngine to calculate cache capacity based on specific layer types, adding logic to AclGraph to correctly identify valid KV caches in mixed-layer models, and refactoring WorkerImpl to selectively allocate specific cache tensors (conv/ssm vs. key/value) per layer. Review feedback highlights the need for better consistency across the engine by utilizing the centralized is_full_attention_layer helper function to avoid logic errors related to default attention intervals and potential division-by-zero issues.

Removed unused layer types variable from worker_impl.cpp
@RobbieLeung
Copy link
Copy Markdown
Collaborator

BTW, the KV cache initialization needs to be split out into a separate function. It's too complex right now.

@JC-ut0
Copy link
Copy Markdown
Contributor Author

JC-ut0 commented Apr 3, 2026

BTW, the KV cache initialization needs to be split out into a separate function. It's too complex right now.

Sure,this kv cache initialization will be refactored in the next PR.

@yingxudeng
Copy link
Copy Markdown
Collaborator

image

Copy link
Copy Markdown
Collaborator

@yq33victor yq33victor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yingxudeng yingxudeng merged commit 34c9a59 into jd-opensource:main Apr 5, 2026
25 of 69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants