Skip to content

selective scan cuda kernel x buffer overflow fix#883

Open
darxradi3nt wants to merge 4 commits intostate-spaces:mainfrom
darxradi3nt:selective_scan_cuda_kernel_fixes_x
Open

selective scan cuda kernel x buffer overflow fix#883
darxradi3nt wants to merge 4 commits intostate-spaces:mainfrom
darxradi3nt:selective_scan_cuda_kernel_fixes_x

Conversation

@darxradi3nt
Copy link
Contributor

int32 signed overflow in x buffer offset corrupts gradients silently for large batch/dstate configs

scenarios that overflow:

Model batch dim dstate seqlen n_chunks batchdimn_chunks*dstate First overflowing batch_id
Mamba-13B 32 5120 256 65536 32 1,342,177,280 ok
Mamba-7B 64 4096 256 67584 33 2,248,146,944 62
Mamba-13B 64 5120 256 65536 32 2,684,354,560 42

For Mamba-7B with batch=64 and seqlen=67584: items batch_id=62 and batch_id=63
write to invalid memory, producing wrong gradients silently on every training step
(~3% of the batch)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant