-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
At the end of 2025 various code analysis engines were utilized to identify the following issues, including AI-based large language models. The same analysis done on the updated master branch at commit c1fcec8.
-
smem_running_prefixwrite missing row offset
read with a per-row offset but written without -
uint32_tstride overflow
when overflow occurs, multiple batch_id values compute the same memory offset.
Those batch items silently read/write the same memory, producing wrong outputs and wrong gradients.
No CUDA error is raised. Realistic trigger: bs=17,dim=16384, seqlen=16384
more triggers:
| Config | u_batch_stride |
First bad batch_id |
Wraps to |
|---|---|---|---|
| dim=16384, seqlen=16384 | 268,435,456 | 16 (bs ≥ 17) | 0 |
| dim=16384, seqlen=32768 | 536,870,912 | 8 (bs ≥ 9) | 0 |
| dim=16384, seqlen=65536 | 1,073,741,824 | 4 (bs ≥ 5) | 0 |
int32_tbuffer overflow
int32 signed overflow in x buffer offset corrupts gradients silently for large batch/dstate configs
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels