Skip to content

[Non-record] SOTA Monolith v4.0#987

Open
Evreu1pro wants to merge 2 commits intoopenai:mainfrom
Evreu1pro:main
Open

[Non-record] SOTA Monolith v4.0#987
Evreu1pro wants to merge 2 commits intoopenai:mainfrom
Evreu1pro:main

Conversation

@Evreu1pro
Copy link
Copy Markdown

I’m working on SOTA Monolith v4.0 — a 11-layer Transformer optimized for the 16MB limit. Here’s the core tech:

Strictly Causal TTT: Real-time adaptation where the model predicts token t, gets the loss for evaluation, and only then updates Q/V weights before hitting token t+1. No data leakage, 100% legal for the challenge.

LeakyReLU(0.5)² MLP: Swapped SwiGLU for this to save ~30% params. This efficiency gain let me push the depth to 11 layers while staying under 16MB.

Parallel Muon: Implemented distributed Newton-Schulz orthogonalization (5 steps). It’s tuned for 8xH100 to overlap computation with all-reduce.

Mixed STE QAT: Using Int5 for MLP and Int6 for Attention with straight-through estimators during training to keep BPB low under heavy compression.

BigramHash & Dynamic Context: Saved 1.5MB on embeddings and used a 256-1024 expanding window for stable evaluation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant