feat: support FIA for qwen model on npu device. by sanlio36 · Pull Request #1147 · jd-opensource/xllm

sanlio36 · 2026-03-31T11:54:59Z

No description provided.

gemini-code-assist

Code Review

This pull request introduces FIA (Fused Interlayer Attention) support for NPU-based Qwen2, Qwen3, and Qwen3-MoE decoder layers by adding a global configuration flag and implementing the required mask and index tensor logic. Review feedback highlights critical bugs where pointers are dereferenced without initialization, potentially causing runtime crashes. Additionally, the implementation is currently limited by a hardcoded sequence length of 2048 for FIA masks, and the Qwen2 variant pack incorrectly uses placeholders instead of the intended index tensors.

gemini-code-assist · 2026-03-31T11:57:09Z

xllm/core/layers/npu/npu_qwen2_decoder_layer_impl.cpp

    ModelInputParams& input_params,
    bool is_prefill) {
+  if (is_prefill) {
+    *prefill_param_.bs = std::max(1, input_params.num_sequences);


The pointer prefill_param_.bs is dereferenced here without being initialized to point to valid memory. This will cause a null pointer dereference or memory corruption at runtime. Ensure that bs is properly allocated (e.g., in the constructor) before this assignment.

gemini-code-assist · 2026-03-31T11:57:10Z

xllm/core/layers/npu/npu_qwen3_decoder_layer_impl.cpp

    bool is_prefill,
    int node_id) {
+  if (is_prefill) {
+    *prefill_param_.bs = std::max(1, input_params.num_sequences);


The pointer prefill_param_.bs is dereferenced without initialization. This is a critical issue that will lead to a crash at runtime.

gemini-code-assist · 2026-03-31T11:57:10Z

xllm/core/layers/npu/npu_qwen3_moe_decoder_layer_impl.cpp

    const ModelInputParams& input_params,
    bool is_prefill) {
+  if (is_prefill) {
+    *prefill_param_.bs = std::max(1, input_params.num_sequences);


The pointer prefill_param_.bs is dereferenced without initialization. This will cause a null pointer dereference.

gemini-code-assist · 2026-03-31T11:57:10Z

xllm/core/layers/npu/npu_qwen2_decoder_layer_impl.cpp

 namespace layer {

+namespace {
+constexpr int64_t kFiaMaskSeqLen = 2048;


Hardcoding kFiaMaskSeqLen to 2048 limits the maximum sequence length supported when FIA is enabled. If the input sequence length exceeds 2048, the fia_attn_mask_ (initialized at line 229) will have incorrect dimensions for the attention operation, potentially leading to incorrect results or crashes. This should be made dynamic or set to a sufficiently large value supported by the model's context window.

gemini-code-assist · 2026-03-31T11:57:10Z

xllm/core/layers/npu/npu_qwen2_decoder_layer_impl.cpp

+  if (is_prefill && FLAGS_enable_fia) {
+    node.variantPack.inTensors.at(input_offset++) =       // bsnd padding_idx
+      placeholder_;
+    node.variantPack.inTensors.at(input_offset++) =       // bsnd unpadding_idx
+      placeholder_;
+  }


The FIA index tensors are not being correctly passed to the variant pack; placeholder_ is used instead of the actual index tensors fia_padding_idx_ and fia_unpadding_idx_. Additionally, the call to build_fia_index_tensors is commented out at line 381. This implementation is incomplete and will cause the FIA feature to fail. Note that calling build_fia_index_tensors inside build_node_variant_pack may have performance implications due to repeated device allocations and H2D transfers; consider caching these tensors if the sequence lengths haven't changed.

if (is_prefill && FLAGS_enable_fia) { build_fia_index_tensors(input_params, x.size(0)); node.variantPack.inTensors.at(input_offset++) = atb_speed::Utils::AtTensor2Tensor(fia_padding_idx_); node.variantPack.inTensors.at(input_offset++) = atb_speed::Utils::AtTensor2Tensor(fia_unpadding_idx_); }

gemini-code-assist · 2026-03-31T11:57:10Z

xllm/core/layers/npu/npu_qwen3_decoder_layer_impl.cpp

 namespace layer {

+namespace {
+constexpr int64_t kFiaMaskSeqLen = 2048;


Hardcoding kFiaMaskSeqLen to 2048 will cause failures for sequences longer than this limit when FIA is enabled, as the attention mask will have incorrect dimensions.

gemini-code-assist · 2026-03-31T11:57:10Z

xllm/core/layers/npu/npu_qwen3_moe_decoder_layer_impl.cpp

 namespace layer {

+namespace {
+constexpr int64_t kFiaMaskSeqLen = 2048;


The hardcoded kFiaMaskSeqLen of 2048 limits the functionality of FIA to short sequences and will cause issues if the input exceeds this length.

sanlio36 added 9 commits March 27, 2026 00:58

feat: suppport fia for qwen2 layer.

cd8edf2

bugfix: gen padding_idx with max actual_seq_len.

6291248

bugfix: delete abundant check.

e07f3e7

feat: padding with q_seq_len.

cca9d14

feat: add q_len input for fia.

9c84299

feat: fix accuracy of fia.

b9e09a6

feat: Enable FIA For Qwen2.

eee739e

feat: enable fia with FLAG.

25d8a9e

feat: support FIA for qwen3 / qwen3moe.

5ad1ae5

sanlio36 requested review from DongheJin, JimHsiung, RobbieLeung, XuZhang99, liutongxuan, walsonyang and yq33victor as code owners March 31, 2026 11:55

feat: update xllm_atb_layers for fia.

7ee2dc8

gemini-code-assist bot reviewed Mar 31, 2026

View reviewed changes

sanlio36 added 2 commits March 31, 2026 20:13

bugfix: format fix.

7444c3b

Merge remote-tracking branch 'main/main' into dev_qwen_fia

4a0d40b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support FIA for qwen model on npu device.#1147

feat: support FIA for qwen model on npu device.#1147
sanlio36 wants to merge 12 commits intojd-opensource:mainfrom
sanlio36:dev_qwen_fia

sanlio36 commented Mar 31, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sanlio36 commented Mar 31, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant