fix: scope get_full_cu_seqlens cache key by device and inference mode#2728
Conversation
Signed-off-by: Dongmin Ra <dongmin.ra@navercorp.com>
for more information, see https://pre-commit.ci
Greptile SummaryThis PR fixes a cache-key collision in Confidence Score: 5/5Safe to merge — the fix is minimal, targeted, and well-tested with no regressions introduced. The change is a one-liner key extension with clear semantics. torch.device is hashable and comparable by value, and torch.is_inference_mode_enabled() is a stable API. The two new tests cover exactly the described failure scenarios. No P0 or P1 issues were found. No files require special attention. Important Files Changed
Reviews (10): Last reviewed commit: "Merge branch 'main' into fix/get_full_cu..." | Re-trigger Greptile |
|
@cyanguwa When you have a moment, could you please take a look at this PR? Thanks:) |
|
@cyanguwa This PR is pretty straightforward. Would you mind taking a quick look? Thank you:) |
|
@cyanguwa Hi:) could you look into this PR? thank you. |
|
@ptrendx The review hasn’t been progressing—would it be possible to change the reviewer? |
|
/te-ci torch L1 |
cyanguwa
left a comment
There was a problem hiding this comment.
Thanks for the PR and sorry about the delay in reviewing! I'll run the CI and merge it. Will make another small PR to properly integrate the new test to our qa/ scripts later.
…NVIDIA#2728) * fix: scope get_full_cu_seqlens cache key by device and inference mode Signed-off-by: Dongmin Ra <dongmin.ra@navercorp.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Dongmin Ra <dongmin.ra@navercorp.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Description
Fixed an issue where the cu_seqlen tensor was incorrectly retrieved from the cache.
(batch_size, max_seqlen)were used as the cache key when retrieving cu_seqlens.(batch_size, max_seqlen)is used.Type of change
Changes
Checklist: