Skip to content

Conversation

@RobbieLeung
Copy link
Collaborator

No description provided.

JimHsiung
JimHsiung previously approved these changes Dec 18, 2025
@XuZhang99 XuZhang99 changed the title bugfix: support PD for MTP. bugfix: support disagg PD for MTP. Dec 18, 2025
XuZhang99
XuZhang99 previously approved these changes Dec 29, 2025
DongheJin
DongheJin previously approved these changes Jan 4, 2026
@zhang-minchao
Copy link
Collaborator

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the KV cache transfer mechanism, seemingly to support disaggregated pipeline parallelism for MTP. The changes primarily involve removing the logic for transferring embeddings and modifying the KV cache allocation and deallocation to use direct aclrt calls instead of the llm_data_dist abstraction. The refactoring of push_kv_blocks to consolidate duplicated logic is a good improvement for maintainability.

My review includes suggestions to remove dead code, fix a potential type-safety issue with pointer casting, and improve code style and memory management. Overall, the changes are logical and well-structured.

@RobbieLeung
Copy link
Collaborator Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the KV cache transfer mechanism, particularly for speculative decoding on NPU, to support disaggregated pipeline parallelism. The key changes include:

  • Modifying the KV cache allocation to use aclrtMalloc for direct device memory allocation followed by registration with llm_data_dist, instead of relying on llm_data_dist for allocation.
  • Removing the logic for handling a separate embedding cache transfer, which simplifies the SpecKVCacheTransfer class by removing several functions and member variables.
  • Refactoring push_kv_blocks and push_kv_blocks_spec to eliminate code duplication by introducing a common push_kv_blocks_internal function.

The changes are well-structured and improve code clarity and maintainability. I have a couple of suggestions to improve robustness and clarity: one regarding the use of a logging macro with a misleading error message, and another to prevent a potential null pointer dereference.

@RobbieLeung RobbieLeung merged commit 6493d8e into jd-opensource:main Jan 5, 2026
17 of 19 checks passed
@RobbieLeung RobbieLeung deleted the feat/mtp_pd branch January 5, 2026 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants