fix: scope get_full_cu_seqlens cache key by device and inference mode by DmCarpe93 · Pull Request #2728 · NVIDIA/TransformerEngine

DmCarpe93 · 2026-03-03T09:09:39Z

Description

Fixed an issue where the cu_seqlen tensor was incorrectly retrieved from the cache.

Currently, only (batch_size, max_seqlen) were used as the cache key when retrieving cu_seqlens.
This coud result in error especially for Knowledge Distillation training, because teacher and student model can be run on same node.
- When teacher model run first, cu_seqlens tensor would be created and cached.
- After that, when student model trains on the same node, the cached cu_seqlens tensor would be used if same (batch_size, max_seqlen) is used.
- Since cached cu_seqlens tensor from teacher model could have different inference mode and device, it could result in error.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

The cache key for retrieving cu_seqlens was updated from (batch_size, max_seqlen) to include both the device and inference mode.
Added testcases for cu_seqlens cache.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Dongmin Ra <dongmin.ra@navercorp.com>

for more information, see https://pre-commit.ci

greptile-apps · 2026-03-03T09:15:34Z

Greptile Summary

This PR fixes a cache-key collision in get_full_cu_seqlens where only (batch_size, max_seqlen) was used as the key, causing tensors created on one device or under inference mode to be silently reused by callers on a different device or in a different autograd mode (e.g. teacher vs. student in Knowledge Distillation). The fix adds device and torch.is_inference_mode_enabled() to the key, and two focused pytest cases validate both isolation scenarios.

Confidence Score: 5/5

Safe to merge — the fix is minimal, targeted, and well-tested with no regressions introduced.

The change is a one-liner key extension with clear semantics. torch.device is hashable and comparable by value, and torch.is_inference_mode_enabled() is a stable API. The two new tests cover exactly the described failure scenarios. No P0 or P1 issues were found.

No files require special attention.

Important Files Changed

Filename	Overview
transformer_engine/pytorch/attention/dot_product_attention/utils.py	Extends the `get_full_cu_seqlens` cache key from `(batch_size, max_seqlen)` to `(batch_size, max_seqlen, device, is_inference)`, correctly isolating cached tensors across devices and inference modes.
tests/pytorch/attention/test_cu_seqlens_cache.py	New test file covering both multi-device isolation and inference-vs-training isolation for the cu_seqlens cache; uses an autouse fixture to clear the cache before/after each test.

_{Reviews (10): Last reviewed commit: "Merge branch 'main' into fix/get_full_cu..." | Re-trigger Greptile}

greptile-apps

_{2 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

DmCarpe93 · 2026-03-11T01:50:47Z

@cyanguwa When you have a moment, could you please take a look at this PR? Thanks:)

DmCarpe93 · 2026-03-17T02:07:14Z

@cyanguwa This PR is pretty straightforward. Would you mind taking a quick look? Thank you:)

DmCarpe93 · 2026-04-01T09:00:34Z

@cyanguwa Hi:) could you look into this PR? thank you.

DmCarpe93 · 2026-04-20T12:13:54Z

@ptrendx The review hasn’t been progressing—would it be possible to change the reviewer?
The same issue keeps occurring, and while we can work around it by modifying the training script used by our team, it’s inconvenient to apply this workaround every time.
It would be great if the fix could be properly reviewed and merged.

cyanguwa · 2026-04-22T21:54:50Z

/te-ci torch L1

cyanguwa

Thanks for the PR and sorry about the delay in reviewing! I'll run the CI and merge it. Will make another small PR to properly integrate the new test to our qa/ scripts later.

…NVIDIA#2728) * fix: scope get_full_cu_seqlens cache key by device and inference mode Signed-off-by: Dongmin Ra <dongmin.ra@navercorp.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Dongmin Ra <dongmin.ra@navercorp.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

DmCarpe93 and others added 2 commits February 27, 2026 16:27

fix: scope get_full_cu_seqlens cache key by device and inference mode

c91cd35

Signed-off-by: Dongmin Ra <dongmin.ra@navercorp.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

02fbe60

for more information, see https://pre-commit.ci

greptile-apps Bot reviewed Mar 3, 2026

View reviewed changes

Merge branch 'main' into fix/get_full_cu_seqlens_cache_key_error

86151e8

ptrendx requested a review from cyanguwa March 3, 2026 18:54

DmCarpe93 added 2 commits March 11, 2026 10:50

Merge branch 'main' into fix/get_full_cu_seqlens_cache_key_error

60d491e

Merge branch 'main' into fix/get_full_cu_seqlens_cache_key_error

319fd26

DmCarpe93 added 3 commits March 19, 2026 16:45

Merge branch 'main' into fix/get_full_cu_seqlens_cache_key_error

c6947b9

Merge branch 'main' into fix/get_full_cu_seqlens_cache_key_error

e65cb43

Merge branch 'main' into fix/get_full_cu_seqlens_cache_key_error

ec08dc1

ptrendx assigned cyanguwa Apr 9, 2026

Merge branch 'main' into fix/get_full_cu_seqlens_cache_key_error

0ff131d

ptrendx added the community-contribution PRs from external contributor outside the core maintainers, representing community-driven work. label Apr 21, 2026

DmCarpe93 and others added 2 commits April 22, 2026 22:10

Merge branch 'main' into fix/get_full_cu_seqlens_cache_key_error

ac7fa09

Merge branch 'main' into fix/get_full_cu_seqlens_cache_key_error

30e2597

cyanguwa approved these changes Apr 22, 2026

View reviewed changes

cyanguwa added 2.16.0 labels Apr 22, 2026

cyanguwa merged commit ab60f4c into NVIDIA:main Apr 23, 2026
46 of 53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: scope get_full_cu_seqlens cache key by device and inference mode#2728

fix: scope get_full_cu_seqlens cache key by device and inference mode#2728
cyanguwa merged 11 commits intoNVIDIA:mainfrom
DmCarpe93:fix/get_full_cu_seqlens_cache_key_error

DmCarpe93 commented Mar 3, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot left a comment

Uh oh!

DmCarpe93 commented Mar 11, 2026

Uh oh!

DmCarpe93 commented Mar 17, 2026

Uh oh!

DmCarpe93 commented Apr 1, 2026 •

edited

Loading

Uh oh!

DmCarpe93 commented Apr 20, 2026

Uh oh!

cyanguwa commented Apr 22, 2026

Uh oh!

cyanguwa left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DmCarpe93 commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps Bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

DmCarpe93 commented Mar 11, 2026

Uh oh!

DmCarpe93 commented Mar 17, 2026

Uh oh!

DmCarpe93 commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DmCarpe93 commented Apr 20, 2026

Uh oh!

cyanguwa commented Apr 22, 2026

Uh oh!

cyanguwa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DmCarpe93 commented Mar 3, 2026 •

edited

Loading

greptile-apps Bot commented Mar 3, 2026 •

edited

Loading

DmCarpe93 commented Apr 1, 2026 •

edited

Loading