Skip to content

feat: add DeepSeek-V4 NPU Sparse Attention (SAS) and Lightning Indexer (LI) patches#221

Open
0hujun wants to merge 16 commits into
modelscope:mainfrom
0hujun:main
Open

feat: add DeepSeek-V4 NPU Sparse Attention (SAS) and Lightning Indexer (LI) patches#221
0hujun wants to merge 16 commits into
modelscope:mainfrom
0hujun:main

Conversation

@0hujun

@0hujun 0hujun commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

feat: add DeepSeek-V4 NPU Sparse Attention (SAS) and Lightning Indexer (LI) patches

Add monkey-patch support for DeepSeek-V4 NPU accelerated attention and indexer
kernels via mindspeed, without modifying the transformers source code.

Changes

  • New src/twinkle/kernel/deepseek_v4_npu.py: Core patch implementation

    • _patched_attention_forward: Replaces DeepseekV4Attention.forward with
      mindspeed.ops.npu_sparse_attn_shared_kv.SparseAttnSharedKV fused kernel.
      Supports all three layer types: sliding_attention, CSA, and HCA.
    • _patched_indexer_forward: Replaces DeepseekV4Indexer.forward with
      mindspeed.ops.npu_lightning_indexer for NPU-accelerated top-k selection.
    • Compressor wrappers ensure 3-tuple return values for compatibility.
    • All patches include ImportError fallback to original implementations.
  • Modified src/twinkle/kernel/monkey_patch_npu.py: Registration and control

    • New _apply_deepseek_v4_npu_patch() called from apply_npu_patch().
    • Two environment variables: TWINKLE_NPU_DSV4_SAS and TWINKLE_NPU_DSV4_LI.
    • Raises ValueError if both SAS and LI are enabled simultaneously.
    • Auto-detects DeepSeek-V4 via config.architectures.
  • New cookbook/transformers/deepseek_v4_patch/README.md: Documentation with
    dependency list, env var reference, and usage examples.

Environment Variables

Variable Default Description
TWINKLE_NPU_DSV4_SAS 0 Enable NPU Sparse Attention
TWINKLE_NPU_DSV4_LI 0 Enable NPU Lightning Indexer

SAS and LI cannot be enabled at the same time.

Dependencies

  • mindspeed: Provides SparseAttnSharedKV and npu_lightning_indexer NPU ops
  • torch_npu: Ascend NPU runtime
  • transformers: Must include DeepSeek-V4 model support

Testing

Verified on Ascend A3 with DeepSeek-V4-Flash-BF16, 4 layers,
8-card EP, gradient checkpointing enabled:

MAX_LENGTH avg_seq_len Result
1024 (1K) 1024 PASS
2048 (2K) 2048 PASS
4096 (4K) 4096 PASS
6144 (6K) 6144 PASS

Time cost:

Metric Baseline (SAS OFF) SAS ON Delta
Avg. duration (6 layers) 4.15 s 3 s −27.7%
Avg. duration (full layers) 26.35 s 15.95 s −65%
Avg. Loss 1e-3

Usage

see README.md

cooperate with @meichangsu1

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces NPU acceleration patches for DeepSeek-V4, adding Sparse Attention Shared-KV (SAS) and Lightning Indexer (LI) monkey-patches using mindspeed operators, along with a training script and documentation. Feedback highlights critical issues in the training script, including an undefined save_checkpoint function and an incorrectly configured DataLoader that lacks device_mesh for distributed training. Additionally, robustness improvements are recommended in the kernel patches, such as handling potential None values for sparse indices and catching broader exceptions to ensure reliable fallbacks to standard PyTorch implementations.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/twinkle/kernel/deepseek_v4_npu.py Outdated
Comment thread src/twinkle/kernel/deepseek_v4_npu.py Outdated
Comment thread src/twinkle/kernel/deepseek_v4_npu.py Outdated
0hujun and others added 4 commits June 9, 2026 09:17
…r (LI) patchesUpdate src/twinkle/kernel/deepseek_v4_npu.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…r (LI) patchesUpdate src/twinkle/kernel/deepseek_v4_npu.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…r (LI) patchesUpdate src/twinkle/kernel/deepseek_v4_npu.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant