Skip to content

Conversation

@sbhavani
Copy link
Collaborator

Description

Update documentation to reflect that cuDNN now supports causal sliding window attention (SWA) starting from version 9.2+.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

Changes:

  • Updated backend support matrix table to show cuDNN supports SWA (cuDNN 9.2+, causal masks only)
  • Added SWA comparison between flash-attention and cuDNN in section 1.3
  • Added clarifying note in cp_ag_thd_dpa_jax_deep_dive.ipynb that cuDNN supports SWA but not all striping patterns for context parallelism

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Update documentation to reflect that cuDNN now supports causal sliding
window attention (SWA) starting from version 9.2+.

Changes:
- Updated backend support matrix table to show cuDNN supports SWA
  (cuDNN 9.2+, causal masks only)
- Added SWA comparison between flash-attention and cuDNN in section 1.3
- Added clarifying note in cp_ag_thd_dpa_jax_deep_dive.ipynb that cuDNN
  supports SWA but not all striping patterns for context parallelism

Technical details:
- cuDNN 9.2+: Supports causal SWA with window_size=(left, 0)
- cuDNN 9.6+: Enhanced support for asymmetric windows (left, right)
- Constraints: Requires dropout=0.0 and bias_type="no_bias"
- Only works with causal mask types

Signed-off-by: Santosh Bhavani <[email protected]>
@sbhavani sbhavani requested a review from pggPL January 26, 2026 18:57
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 26, 2026

Greptile Overview

Greptile Summary

Updated documentation to reflect that cuDNN now supports causal sliding window attention (SWA) starting from version 9.2+.

Changes include:

  • Updated backend support matrix table to show cuDNN supports SWA (cuDNN 9.2+, causal masks only)
  • Added SWA comparison between flash-attention and cuDNN in section 1.3, specifying that cuDNN supports causal SWA but requires dropout=0.0 and bias_type="no_bias"
  • Added clarifying note in cp_ag_thd_dpa_jax_deep_dive.ipynb that cuDNN supports SWA but not all striping patterns for context parallelism

The documentation changes are accurate and consistent with the implementation in transformer_engine/common/fused_attn/fused_attn.cpp, which shows cuDNN 9.2 introduced SWA support for causal masks with specific limitations.

Confidence Score: 5/5

  • This PR is safe to merge with no risk
  • Documentation-only changes that accurately reflect cuDNN 9.2+ SWA support. The changes are well-written, consistent with the codebase implementation, and provide clear guidance on the limitations (causal masks only, requires dropout=0.0 and bias_type="no_bias"). No code changes, no risk of runtime issues.
  • No files require special attention

Important Files Changed

Filename Overview
docs/examples/attention/attention.ipynb Updated backend support matrix and cuDNN attention section to document that cuDNN 9.2+ supports causal sliding window attention with specific constraints
docs/examples/attention/cp_ag_thd_dpa_jax_deep_dive.ipynb Added clarifying note about cuDNN SWA support and striping pattern limitations for context parallelism

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant Docs as Documentation
    participant User as End User
    participant Code as TransformerEngine Code
    
    Dev->>Docs: Update attention.ipynb
    Note over Docs: Add SWA support info in section 1.3<br/>Update backend support matrix table<br/>Specify cuDNN 9.2+ causal SWA support
    
    Dev->>Docs: Update cp_ag_thd_dpa_jax_deep_dive.ipynb
    Note over Docs: Add note about cuDNN SWA support<br/>Clarify striping pattern limitations
    
    User->>Docs: Read documentation
    Docs-->>User: cuDNN 9.2+ supports causal SWA<br/>Requires dropout=0.0 and bias_type="no_bias"<br/>Not all striping patterns supported for CP
    
    User->>Code: Configure attention with SWA
    Note over Code: cuDNN 9.2+ implementation<br/>fused_attn.cpp enforces constraints
    Code-->>User: SWA execution with documented constraints
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant