Non-record: 24.7M params · int6 · Binary U-Net/SmearGate/BigramHash · 1.5hr · RTX 5060 Ti 16GB by randy06122001-boop · Pull Request #997 · openai/parameter-golf

randy06122001-boop · 2026-03-28T00:55:54Z

24.7M params · int6 · Binary U-Net/SmearGate/BigramHash · 1.5hr · RTX 5060 Ti 16GB

Non-record submission - trained on RTX 5060 Ti 16GB for 1.5 hours.

Approach

Binary U-Net / SmearGate / BigramHash
Quantized to 6 bits (int6)
24.7M parameters

Int6 Quantisation + 10L (5 Encoder - 5 Decoder) + Muon + 3x ReLU² MLP + SmearGate + BigramHash + SWA + RTX 5060 Ti

val_bpb: 1.4182 (roundtrip, seed=1337) | 11.63 MB artifact | NVIDIA RTX 5060 Ti, 2469 steps (~1.5h)

Results (seed=1337, RTX 5060 Ti)

Metric	Value
val_bpb (Roundtrip)	1.4182
val_loss	2.3946
Steps	2469
ms/step	2187.4
Training time	5,400s (~1.5h)
Artifact	11,633,008 bytes (11.63MB)
Parameters	24,730,704

Hardware & Environment

This run was executed on a local NVIDIA GeForce RTX 5060 Ti (16GB). The environment uses Windows, so torch.compile (Triton) was disabled. The run demonstrates that advanced architectural features like SmearGate and Int6 quantization can significantly improve performance even on consumer-grade silicon.

Architecture

10 transformer layers (U-Net style: 5 encoder, 5 decoder blocks)
Model Dimension: 512, 8 heads, 4 KV heads (GQA)
Quantization: Int6 per-row for block weights, Int8/FP16 for others.
3x MLP expansion: (hidden=1536) with ReLU² activation.
SmearGate: causal blending of token embeddings with previous context.
BigramHash: 4096-bucket hash embedding for consecutive token pairs.
ResidMix: Learned residual blending across blocks.
Embedding: Tied 1024-vocab embedding.

Optimization

Muon Optimizer: Newton-Schulz iteration (5 steps) for all matrix parameters with 0.04 weight decay.
AdamW: Used for scalar parameters and embeddings with 0.04 weight decay.
SWA (Stochastic Weight Averaging): Averaged 20 checkpoints during the warmdown phase for better generalization.
Orthogonal Initialization: Used for all matrix weights.

Compression

Int6 + Zstd-22: Int6 quantization for block weights allows for significantly more parameters/layers within the 16MB limit, achieving a 3.73x payload compression ratio.

Evaluation

Roundtrip Validation: Final metrics verified after dequantization to ensure performance persists in the exported artifact.

Files

- quantized model
- training script
- training logs

… 1.5hr · RTX 5060 Ti 16GB

Non-record: 24.7M params · int6 · Binary U-Net/SmearGate/BigramHash ·…

e117759

… 1.5hr · RTX 5060 Ti 16GB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: 24.7M params · int6 · Binary U-Net/SmearGate/BigramHash · 1.5hr · RTX 5060 Ti 16GB#997

Non-record: 24.7M params · int6 · Binary U-Net/SmearGate/BigramHash · 1.5hr · RTX 5060 Ti 16GB#997
randy06122001-boop wants to merge 1 commit intoopenai:mainfrom
randy06122001-boop:main

randy06122001-boop commented Mar 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

randy06122001-boop commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

24.7M params · int6 · Binary U-Net/SmearGate/BigramHash · 1.5hr · RTX 5060 Ti 16GB

Approach

Results (seed=1337, RTX 5060 Ti)

Hardware & Environment

Architecture

Optimization

Compression

Evaluation

Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

randy06122001-boop commented Mar 28, 2026 •

edited

Loading