feat: add DeepSeek-V4 NPU Sparse Attention (SAS) and Lightning Indexer (LI) patches by 0hujun · Pull Request #221 · modelscope/twinkle

0hujun · 2026-06-08T15:04:08Z

feat: add DeepSeek-V4 NPU Sparse Attention (SAS) and Lightning Indexer (LI) patches

Add monkey-patch support for DeepSeek-V4 NPU accelerated attention and indexer
kernels via mindspeed, without modifying the transformers source code.

Changes

New src/twinkle/kernel/deepseek_v4_npu.py: Core patch implementation
- _patched_attention_forward: Replaces DeepseekV4Attention.forward with
  mindspeed.ops.npu_sparse_attn_shared_kv.SparseAttnSharedKV fused kernel.
  Supports all three layer types: sliding_attention, CSA, and HCA.
- _patched_indexer_forward: Replaces DeepseekV4Indexer.forward with
  mindspeed.ops.npu_lightning_indexer for NPU-accelerated top-k selection.
- Compressor wrappers ensure 3-tuple return values for compatibility.
- All patches include ImportError fallback to original implementations.
Modified src/twinkle/kernel/monkey_patch_npu.py: Registration and control
- New _apply_deepseek_v4_npu_patch() called from apply_npu_patch().
- Two environment variables: TWINKLE_NPU_DSV4_SAS and TWINKLE_NPU_DSV4_LI.
- Raises ValueError if both SAS and LI are enabled simultaneously.
- Auto-detects DeepSeek-V4 via config.architectures.
New cookbook/transformers/deepseek_v4_patch/README.md: Documentation with
dependency list, env var reference, and usage examples.

Environment Variables

Variable	Default	Description
`TWINKLE_NPU_DSV4_SAS`	`0`	Enable NPU Sparse Attention
`TWINKLE_NPU_DSV4_LI`	`0`	Enable NPU Lightning Indexer

SAS and LI cannot be enabled at the same time.

Dependencies

mindspeed: Provides SparseAttnSharedKV and npu_lightning_indexer NPU ops
torch_npu: Ascend NPU runtime
transformers: Must include DeepSeek-V4 model support

Testing

Verified on Ascend A3 with DeepSeek-V4-Flash-BF16, 4 layers,
8-card EP, gradient checkpointing enabled:

MAX_LENGTH	avg_seq_len	Result
1024 (1K)	1024	PASS
2048 (2K)	2048	PASS
4096 (4K)	4096	PASS
6144 (6K)	6144	PASS

Time cost:

Metric	Baseline (SAS OFF)	SAS ON	Delta
Avg. duration (6 layers)	4.15 s	3 s	−27.7%
Avg. duration (full layers)	26.35 s	15.95 s	−65%
Avg. Loss	—	—	1e-3

Usage

see README.md

cooperate with @meichangsu1

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…r (LI) patches

gemini-code-assist

Code Review

This pull request introduces NPU acceleration patches for DeepSeek-V4, adding Sparse Attention Shared-KV (SAS) and Lightning Indexer (LI) monkey-patches using mindspeed operators, along with a training script and documentation. Feedback highlights critical issues in the training script, including an undefined save_checkpoint function and an incorrectly configured DataLoader that lacks device_mesh for distributed training. Additionally, robustness improvements are recommended in the kernel patches, such as handling potential None values for sparse indices and catching broader exceptions to ensure reliable fallbacks to standard PyTorch implementations.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

…r (LI) patches

…r (LI) patchesUpdate src/twinkle/kernel/deepseek_v4_npu.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

0hujun and others added 12 commits May 28, 2026 15:17

fix: Npu Group MatMul op patchs only in EP

98e69cd

Update src/twinkle/kernel/monkey_patch_npu.py

1992ca0

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update src/twinkle/kernel/monkey_patch_npu.py

598c5ab

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Merge branch 'modelscope:main' into main

f7dafe5

feat: add DeepSeek-V4 NPU Sparse Attention (SAS) and Lightning Indexe…

c6590ce

…r (LI) patches

feat: add DeepSeek-V4 NPU Sparse Attention (SAS) and Lightning Indexe…

7d05df5

…r (LI) patches

feat: add DeepSeek-V4 NPU Sparse Attention (SAS) and Lightning Indexe…

1e770a4

…r (LI) patches

feat: add DeepSeek-V4 NPU Sparse Attention (SAS) and Lightning Indexe…

451ef16

…r (LI) patches

feat: add DeepSeek-V4 NPU Sparse Attention (SAS) and Lightning Indexe…

0a6447b

…r (LI) patches

feat: add DeepSeek-V4 NPU Sparse Attention (SAS) and Lightning Indexe…

884a78c

…r (LI) patches

feat: add DeepSeek-V4 NPU Sparse Attention (SAS) and Lightning Indexe…

9b1e26c

…r (LI) patches

feat: add DeepSeek-V4 NPU Sparse Attention (SAS) and Lightning Indexe…

aa3caea

…r (LI) patches

gemini-code-assist Bot reviewed Jun 8, 2026

View reviewed changes

0hujun and others added 4 commits June 9, 2026 09:17

feat: add DeepSeek-V4 NPU Sparse Attention (SAS) and Lightning Indexe…

491d562

…r (LI) patches

feat: add DeepSeek-V4 NPU Sparse Attention (SAS) and Lightning Indexe…

947a94e

…r (LI) patchesUpdate src/twinkle/kernel/deepseek_v4_npu.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

feat: add DeepSeek-V4 NPU Sparse Attention (SAS) and Lightning Indexe…

74b2cdb

…r (LI) patchesUpdate src/twinkle/kernel/deepseek_v4_npu.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

feat: add DeepSeek-V4 NPU Sparse Attention (SAS) and Lightning Indexe…

57adc45

…r (LI) patchesUpdate src/twinkle/kernel/deepseek_v4_npu.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add DeepSeek-V4 NPU Sparse Attention (SAS) and Lightning Indexer (LI) patches#221

feat: add DeepSeek-V4 NPU Sparse Attention (SAS) and Lightning Indexer (LI) patches#221
0hujun wants to merge 16 commits into
modelscope:mainfrom
0hujun:main

0hujun commented Jun 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0hujun commented Jun 8, 2026