Skip to content

mamba selective_scan CUDA kernel bugs #880

@darxradi3nt

Description

@darxradi3nt

At the end of 2025 various code analysis engines were utilized to identify the following issues, including AI-based large language models. The same analysis done on the updated master branch at commit c1fcec8.

  • smem_running_prefix write missing row offset
    read with a per-row offset but written without

  • uint32_t stride overflow
    when overflow occurs, multiple batch_id values compute the same memory offset.
    Those batch items silently read/write the same memory, producing wrong outputs and wrong gradients.
    No CUDA error is raised. Realistic trigger: bs=17,dim=16384, seqlen=16384

more triggers:

Config u_batch_stride First bad batch_id Wraps to
dim=16384, seqlen=16384 268,435,456 16 (bs ≥ 17) 0
dim=16384, seqlen=32768 536,870,912 8 (bs ≥ 9) 0
dim=16384, seqlen=65536 1,073,741,824 4 (bs ≥ 5) 0
  • int32_t buffer overflow
    int32 signed overflow in x buffer offset corrupts gradients silently for large batch/dstate configs

PR: #881 #882 #883

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions