Skip to content

Non-record: 33.6M Int5 GPTQ + Legal s_0-only TTT (val_bpb=1.1182)#1004

Open
ibarrajo wants to merge 4 commits intoopenai:mainfrom
ibarrajo:approach-b
Open

Non-record: 33.6M Int5 GPTQ + Legal s_0-only TTT (val_bpb=1.1182)#1004
ibarrajo wants to merge 4 commits intoopenai:mainfrom
ibarrajo:approach-b

Conversation

@ibarrajo
Copy link
Copy Markdown

Summary

val_bpb: 1.1182 (s_0 score only, single seed — additional seeds pending)

Resubmission addressing PR #991's closure. Key fix: reports ONLY the cumulative s_0 score from the first scoring pass. No post-TTT re-evaluation. No temperature calibration on re-scored tokens.

What changed from PR #991

  • Removed illegal post-TTT re-eval — PR Record: 33.6M Int5 GPTQ + Score-First TTT (val_bpb=1.1145, 3-seed) #991 reported s_1 (re-scored after training). This PR reports s_0 (scored before training on each chunk).
  • Removed temperature calibration — T=0.98 on re-scored tokens was illegal. Removed entirely.
  • Increased pruning 3%→5% — ensures artifact <16MB across all seeds.
  • All assertions pass: train+gptq < 600s, artifact < 16MB, eval < 600s.

Results

Metric Value
Base (no TTT, sliding window) 1.1246
Legal s_0 TTT 1.1182
TTT improvement -0.0064
Artifact 15,535,414 bytes (465KB headroom)
Train+GPTQ 593.8s / 600s
Eval ~414s / 600s

Rule compliance

  • s_0 only — each token scored BEFORE training, cumulative loss reported
  • No re-scoring — no second eval pass after TTT
  • No temperature calibration — removed
  • GPTQ within training budget — 593.8s total
  • Artifact < 16MB — 15.5MB with 465KB headroom
  • Eval < 600s — ~414s
  • Assertions enforce all constraints at runtime

Architecture

33.6M params (d=576, MLP 3.5x=1792, 11L), int5 GPTQ, XSA-all(11), BigramHash(8192), EMA(0.997), 5% magnitude pruning. Based on PR #576 by @cmcdnd.

🤖 Generated with Claude Code

ibarrajo and others added 4 commits March 27, 2026 17:03
Train larger (33.6M params, d=576, MLP 3.5x), quantize harder (int5 GPTQ).
Legal score-first TTT (AdamW, cosine LR, 3 epochs) + post-TTT temperature
calibration (T=0.98). 3-seed mean 1.1145 BPB (std 0.0003). Based on PR openai#576.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Train 590s + GPTQ 3.8s = 593.9s < 600s (within budget)
- 3% pruning → artifact 15.3MB with 711KB headroom
- Added assertions: artifact < 16MB, train+gptq < 600s, eval < 600s
- Seed 1337: val_bpb=1.1148

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Seed 1337: 1.1148 BPB, artifact 15.3MB, train+gptq 593.9s
Seed 42:   1.1154 BPB, artifact 15.3MB, train+gptq 593.7s
Seed 2025: 1.1148 BPB, artifact 15.8MB, train+gptq 593.9s
Mean: 1.1150 (std 0.0003)

All seeds: artifact < 16MB, train+gptq < 600s, eval < 600s.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Reports ONLY s_0 (cumulative first-pass score) — no re-eval after TTT
- 5% pruning → artifact 15.5MB (465KB headroom)
- Train+GPTQ: 593.8s < 600s
- Eval (sliding + TTT): ~414s < 600s
- Addresses PR openai#991 closure: removed illegal post-TTT re-scoring

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant