[Non-record] SOTA Monolith v4.0 by Evreu1pro · Pull Request #987 · openai/parameter-golf

Evreu1pro · 2026-03-27T21:42:40Z

I’m working on SOTA Monolith v4.0 — a 11-layer Transformer optimized for the 16MB limit. Here’s the core tech:

Strictly Causal TTT: Real-time adaptation where the model predicts token t, gets the loss for evaluation, and only then updates Q/V weights before hitting token t+1. No data leakage, 100% legal for the challenge.

LeakyReLU(0.5)² MLP: Swapped SwiGLU for this to save ~30% params. This efficiency gain let me push the depth to 11 layers while staying under 16MB.

Parallel Muon: Implemented distributed Newton-Schulz orthogonalization (5 steps). It’s tuned for 8xH100 to overlap computation with all-reduce.

Mixed STE QAT: Using Int5 for MLP and Int6 for Attention with straight-through estimators during training to keep BPB low under heavy compression.

BigramHash & Dynamic Context: Saved 1.5MB on embeddings and used a 256-1024 expanding window for stable evaluation.

Evreu1pro added 2 commits March 27, 2026 22:33

Add files via upload

3fb4bf5

Update README.md

26d7c51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Non-record] SOTA Monolith v4.0#987

[Non-record] SOTA Monolith v4.0#987
Evreu1pro wants to merge 2 commits intoopenai:mainfrom
Evreu1pro:main

Evreu1pro commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Evreu1pro commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant