Non-record: GatedDeltaNet SSM via fla library — 1.2907 bpb, 15.79MB by dnldsz · Pull Request #970 · openai/parameter-golf

dnldsz · 2026-03-27T18:23:24Z

Summary

Gated DeltaNet selective state space model using production Triton kernels from the flash-linear-attention (fla) library
12 layers, 384d, ~13.7M params, 15.79 MB int8+zlib (under 16MB limit)
val_bpb: 1.2907 (int8+zlib roundtrip), 1.2781 pre-quant
Trained on 8×H100 for 10 minutes (~4,962 steps at 121ms/step)
Non-record unlimited compute track

Architecture

Replaces attention with Gated DeltaNet (delta-rule SSM):

State update: S_t = α_t · S_{t-1} · (I − β_t · k_t kᵀ_t) + β_t · v_t · kᵀ_t
Chunk-parallel scan via fla's fused Triton kernels (chunk_size=64)
U-Net skip connections, LeakyReLU(0.5)² MLP, BigramHash embedding, z-loss, polynomial softcap
Muon optimizer for 2D weights; delta-rule params (a_proj, b_proj, A_log, dt_bias) explicitly routed to Adam

Setup

pip install flash-linear-attention einops sentencepiece
torchrun --standalone --nproc_per_node=8 train_gpt.py

Notes

Submitted as a non-record SSM baseline. GDN weights compress less efficiently than transformer weights (~2.8× vs ~3.7×), limiting model width to 384d at 16MB. Future work: QAT, hybrid SSM+attention, longer context.

dnldsz added 2 commits March 28, 2026 04:22

Add GatedDeltaNet SSM non-record submission (1.2907 bpb, 15.79MB)

d21a51c

Fix submission.json: add track field and pre-quant metrics

d10e471

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: GatedDeltaNet SSM via fla library — 1.2907 bpb, 15.79MB#970

Non-record: GatedDeltaNet SSM via fla library — 1.2907 bpb, 15.79MB#970
dnldsz wants to merge 2 commits intoopenai:mainfrom
dnldsz:submission-clean

dnldsz commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dnldsz commented Mar 27, 2026

Summary

Architecture

Setup

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant