Non-record: 33.6M Int5 GPTQ + Legal s_0-only TTT (val_bpb=1.1182) by ibarrajo · Pull Request #1004 · openai/parameter-golf

ibarrajo · 2026-03-28T05:36:42Z

Summary

val_bpb: 1.1182 (s_0 score only, single seed — additional seeds pending)

Resubmission addressing PR #991's closure. Key fix: reports ONLY the cumulative s_0 score from the first scoring pass. No post-TTT re-evaluation. No temperature calibration on re-scored tokens.

What changed from PR #991

Removed illegal post-TTT re-eval — PR Record: 33.6M Int5 GPTQ + Score-First TTT (val_bpb=1.1145, 3-seed) #991 reported s_1 (re-scored after training). This PR reports s_0 (scored before training on each chunk).
Removed temperature calibration — T=0.98 on re-scored tokens was illegal. Removed entirely.
Increased pruning 3%→5% — ensures artifact <16MB across all seeds.
All assertions pass: train+gptq < 600s, artifact < 16MB, eval < 600s.

Results

Metric	Value
Base (no TTT, sliding window)	1.1246
Legal s_0 TTT	1.1182
TTT improvement	-0.0064
Artifact	15,535,414 bytes (465KB headroom)
Train+GPTQ	593.8s / 600s
Eval	~414s / 600s

Rule compliance

s_0 only — each token scored BEFORE training, cumulative loss reported
No re-scoring — no second eval pass after TTT
No temperature calibration — removed
GPTQ within training budget — 593.8s total
Artifact < 16MB — 15.5MB with 465KB headroom
Eval < 600s — ~414s
Assertions enforce all constraints at runtime

Architecture

33.6M params (d=576, MLP 3.5x=1792, 11L), int5 GPTQ, XSA-all(11), BigramHash(8192), EMA(0.997), 5% magnitude pruning. Based on PR #576 by @cmcdnd.

🤖 Generated with Claude Code

Train larger (33.6M params, d=576, MLP 3.5x), quantize harder (int5 GPTQ). Legal score-first TTT (AdamW, cosine LR, 3 epochs) + post-TTT temperature calibration (T=0.98). 3-seed mean 1.1145 BPB (std 0.0003). Based on PR openai#576. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- Train 590s + GPTQ 3.8s = 593.9s < 600s (within budget) - 3% pruning → artifact 15.3MB with 711KB headroom - Added assertions: artifact < 16MB, train+gptq < 600s, eval < 600s - Seed 1337: val_bpb=1.1148 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Seed 1337: 1.1148 BPB, artifact 15.3MB, train+gptq 593.9s Seed 42: 1.1154 BPB, artifact 15.3MB, train+gptq 593.7s Seed 2025: 1.1148 BPB, artifact 15.8MB, train+gptq 593.9s Mean: 1.1150 (std 0.0003) All seeds: artifact < 16MB, train+gptq < 600s, eval < 600s. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- Reports ONLY s_0 (cumulative first-pass score) — no re-eval after TTT - 5% pruning → artifact 15.5MB (465KB headroom) - Train+GPTQ: 593.8s < 600s - Eval (sliding + TTT): ~414s < 600s - Addresses PR openai#991 closure: removed illegal post-TTT re-scoring Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

ibarrajo and others added 4 commits March 27, 2026 17:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: 33.6M Int5 GPTQ + Legal s_0-only TTT (val_bpb=1.1182)#1004

Non-record: 33.6M Int5 GPTQ + Legal s_0-only TTT (val_bpb=1.1182)#1004
ibarrajo wants to merge 4 commits intoopenai:mainfrom
ibarrajo:approach-b

ibarrajo commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ibarrajo commented Mar 28, 2026

Summary

What changed from PR #991

Results

Rule compliance

Architecture

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant