-
Notifications
You must be signed in to change notification settings - Fork 136
bugfix: support disagg PD for MTP. #551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1fad4fe to
3847fc1
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the KV cache transfer mechanism, seemingly to support disaggregated pipeline parallelism for MTP. The changes primarily involve removing the logic for transferring embeddings and modifying the KV cache allocation and deallocation to use direct aclrt calls instead of the llm_data_dist abstraction. The refactoring of push_kv_blocks to consolidate duplicated logic is a good improvement for maintainability.
My review includes suggestions to remove dead code, fix a potential type-safety issue with pointer casting, and improve code style and memory management. Overall, the changes are logical and well-structured.
3847fc1 to
fb4069c
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the KV cache transfer mechanism, particularly for speculative decoding on NPU, to support disaggregated pipeline parallelism. The key changes include:
- Modifying the KV cache allocation to use
aclrtMallocfor direct device memory allocation followed by registration withllm_data_dist, instead of relying onllm_data_distfor allocation. - Removing the logic for handling a separate embedding cache transfer, which simplifies the
SpecKVCacheTransferclass by removing several functions and member variables. - Refactoring
push_kv_blocksandpush_kv_blocks_specto eliminate code duplication by introducing a commonpush_kv_blocks_internalfunction.
The changes are well-structured and improve code clarity and maintainability. I have a couple of suggestions to improve robustness and clarity: one regarding the use of a logging macro with a misleading error message, and another to prevent a potential null pointer dereference.
fb4069c to
881bf6c
Compare
No description provided.