selective scan cuda kernel x buffer overflow fix by darxradi3nt · Pull Request #883 · state-spaces/mamba

darxradi3nt · 2026-03-26T17:26:26Z

int32 signed overflow in x buffer offset corrupts gradients silently for large batch/dstate configs

scenarios that overflow:

Model	batch	dim	dstate	seqlen	n_chunks	batchdimn_chunks*dstate	First overflowing batch_id
Mamba-13B	32	5120	256	65536	32	1,342,177,280	ok
Mamba-7B	64	4096	256	67584	33	2,248,146,944	62
Mamba-13B	64	5120	256	65536	32	2,684,354,560	42

For Mamba-7B with batch=64 and seqlen=67584: items batch_id=62 and batch_id=63
write to invalid memory, producing wrong gradients silently on every training step
(~3% of the batch)

…te configs

darxradi3nt added 4 commits March 25, 2026 23:25

fixes issue state-spaces#880

9b4a52b

fixes issue state-spaces#880

1871459

fixes issue state-spaces#880

1392be9

fix: x buffer offset corrupts gradients silently for large batch/dsta…

d9a121d

…te configs

This was referenced Mar 26, 2026

mamba selective_scan CUDA kernel bugs #880

Closed

[bug] int32 signed overflow in buffer x #884

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

selective scan cuda kernel x buffer overflow fix#883

selective scan cuda kernel x buffer overflow fix#883
darxradi3nt wants to merge 4 commits intostate-spaces:mainfrom
darxradi3nt:selective_scan_cuda_kernel_fixes_x

darxradi3nt commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

darxradi3nt commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant