Every file in the repository. Click any directory in the tree to jump to its description table.
.aider.model.metadata.json — Aider model token limits and cost
.aider.model.settings.yml — Aider model behavior settings
.env.example — Docker Compose environment template
.gitignore — Git ignore rules
atlas.conf.example — K3s deployment configuration template
docker-compose.yml — 5-service Docker Compose stack
pyproject.toml — Python package definition (atlas CLI entry point)
LICENSE — GNU Affero General Public License v3.0 (AGPL-3.0)
README.md — Project overview, benchmarks, setup
CHANGELOG.md — Release history
CODE_OF_CONDUCT.md — Community guidelines
CONTRIBUTING.md — Contributor guide
atlas-proxy/ — Go proxy: agent loop, grammar, tool calls
atlas/ — Python CLI package
benchmark/ — Benchmark runner and datasets
geometric-lens/ — Scoring, RAG, routing, pattern cache
v3-service/ — V3 pipeline HTTP wrapper
main.py — HTTP server, pipeline orchestrator, LLM/Lens/Sandbox adapters
Dockerfile — Container build (CPU PyTorch, port 8070)
sandbox/ — Isolated code execution
executor_server.py — FastAPI server, 8 language executors, linting, error classification
Dockerfile — Container build (Python, Node, Go, Rust, gcc)
inference/ — llama-server configuration
scripts/ — Build, deploy, and training automation
tests/ — Test suite
validate_tests.py — Test runner entry point
conftest.py — Pytest shared fixtures
infrastructure/
integration/
v3/ — V3 module unit tests (22 files)
test_plan_search.py , test_div_sampling.py , test_budget_forcing.py , test_blend_asc.py , test_reasc.py , test_s_star.py , test_candidate_selection.py , test_failure_analysis.py , test_constraint_refinement.py , test_pr_cot.py , test_derivation_chains.py , test_refinement_loop.py , test_metacognitive.py , test_ace_pipeline.py , test_self_test_gen.py , test_lens_feedback.py , test_embedding_store.py , test_ablation_analysis.py , test_ewc.py , test_replay_buffer.py , test_enhanced_retrain.py , test_phase4_validation.py , test_sandbox_adapter.py
docs/ — Documentation
v3_ablation_results/ — Published ablation data
README.md — Data format documentation
config.json — Ablation run configuration
preflight.json — Pre-run system checks
condition_a_baseline/ — Baseline (54.9%, 599 tasks)
condition_b_phase1/ — +Phase 1 (67.3%, 599 tasks)
condition_c_phase1_2/ — +Phase 1+2 (67.3%, 599 tasks)
condition_d_phase1_3/ — +Phase 1+3 (74.6%, 599 tasks)
Each condition contains summary.json, v3_lcb/results.json, and v3_lcb/per_task/ (599 per-task JSON files)
File
Description
.aider.model.metadata.json
Aider model metadata: token limits (32K), cost ($0 — local), provider (openai)
.aider.model.settings.yml
Aider behavior: whole-file edit format, repo map enabled, streaming on, temperature 0.3
.env.example
Docker Compose env template: model path, ports (8080/8099/8070/30820/8090), context size
atlas.conf.example
K3s deployment config: model, GPU layers, parallel slots, NodePorts, namespace
docker-compose.yml
5-service stack: llama-server, geometric-lens, v3-service, sandbox, atlas-proxy
pyproject.toml
Python package: atlas CLI entry point (atlas.cli.repl:run), requires Python >= 3.9
.gitignore
Ignores: model weights, pycache , .aider* (except config files), logs, .env
File
Description
README.md
Project overview, 74.6% LCB benchmark, setup instructions, hardware requirements
CHANGELOG.md
Release history: V3.0.1 (2026-04-05), V3.0, V2.5, V2
LICENSE
GNU Affero General Public License v3.0 (AGPL-3.0)
CODE_OF_CONDUCT.md
Contributor Covenant Code of Conduct
CONTRIBUTING.md
How to contribute: fork, branch, test, PR workflow
atlas-proxy/ — Agent Loop (Go)
The core of the V3.0.1 CLI. Receives OpenAI-compatible requests from Aider, runs a grammar-constrained agent loop with 8 tools, and routes complex files through the V3 pipeline.
File
Lines
Description
main.go
2890
HTTP server, /v1/chat/completions handler, verify-repair pipeline, best-of-K, format normalization, error analysis, Lens scoring, sandbox testing
agent.go
740
Agent loop iteration, JSON schema generation, system prompt building, LLM calls with grammar constraint, exploration budget, truncation recovery
tools.go
905
8 tool definitions (read/write/edit/delete file, run command, search, list dir, plan tasks), per-file tier classifier, V3 routing
aider_format.go
697
Converts agent results to Aider whole-file blocks, streams real-time status with icons, project directory detection, delete fast-path
grammar.go
192
JSON schema (oneOf: tool_call/text/done) and GBNF grammar for constrained output, tool documentation generation
types.go
390
AgentContext, ToolDef, ToolResult, tier definitions (T0-T3), max turns per tier, permission types
v3_bridge.go
120
HTTP bridge to Python V3 service with SSE progress streaming, Lens scoring bridge
v3_adapter.go
177
Translates file write requests into V3GenerateRequest with project context, framework detection, constraint extraction
build_verify.go
157
Per-file-type verification: tsc, py_compile, go build, cargo check, gcc, bash -n. Framework-specific overrides
project.go
226
Detects language (Node/Python/Rust/Go/C/Shell), framework (Next.js/Flask/Express), build/dev/test commands
permissions.go
150
Allow/deny rules, dangerous pattern detection (rm -rf, .env, credentials), mode-based access
parallel.go
213
plan_tasks executor: topological sort, concurrent sub-task execution (15-turn budget each)
go.mod
—
Go module definition
Dockerfile
—
Multi-stage Go build for containerized deployment
Standalone REPL for direct interaction with ATLAS services (without Aider).
File
Description
cli/repl.py
Main entry point (atlas command). Interactive REPL with /solve, /bench, /status, /help. Pipe mode support.
cli/client.py
HTTP client for llama-server, Geometric Lens, sandbox. Health checks, generation (batch + streaming), scoring, sandbox execution.
cli/display.py
Terminal formatting: banner, colors, status blocks, prompts, separators
cli/commands/solve.py
/solve: generate code from LLM, extract from think blocks, score via Lens, test via sandbox
cli/commands/bench.py
/bench: delegates to benchmark.v3_runner with dataset/strategy/task-count args
cli/commands/status.py
/status: check health of llama-server, Lens, sandbox
benchmark/ — Benchmark Infrastructure
Runner infrastructure for evaluating LLM code generation across multiple datasets.
File
Description
runner.py
Core execution: function mode + stdio mode, LLM API calls, ChatML formatting, code extraction
v2_runner.py
V2 benchmark runner: phases 0-6, telemetry, Mode A/B, crash recovery
v3_runner.py
V3 benchmark runner: full pipeline with ablation conditions A-F
v2_report.py
Markdown report generator from benchmark results
cli.py
CLI entry point: atlas benchmark --humaneval --dry-run etc.
config.py
BenchmarkConfig loaded from atlas.conf
models.py
Data models: BenchmarkTask, AttemptResult, TaskResult, BenchmarkRun
best_of_k.py
Best-of-K candidate evaluation with scoring
geo_learning.py
Geometric learning integration for benchmarks
benchmark/datasets/ — Dataset Loaders
Each loader downloads from HuggingFace (JSON rows API, no pyarrow) and normalizes to BenchmarkTask format.
File
Tasks
Eval Mode
Description
base.py
—
—
Abstract BaseDataset class with download, parse, validate
humaneval.py
164
function
HumanEval function completion
mbpp.py
500
function
MBPP with 3-shot [BEGIN]/[DONE] format
evalplus_humaneval.py
164
function
HumanEval+ (EvalPlus augmented tests)
evalplus_mbpp.py
500
function
MBPP+ (EvalPlus augmented tests)
livecodebench.py
599
stdio
LiveCodeBench v5 from bzantium mirror
gpqa.py
198
mcq
GPQA Diamond from OpenAI blob CSV
ifbench.py
300
ifbench
IFBench instruction-following with loose eval
scicode.py
~80
function
SciCode cross-domain scientific coding
benchmark/analysis/ — Analysis Utilities
benchmark/custom/ — Custom Tasks
benchmark/v3/ — V3 Pipeline Modules
19 Python modules implementing the V3 code generation pipeline. Each module follows a Config + Event + Controller pattern.
Module
Phase
Description
plan_search.py
1A
3-step pipeline: extract constraints -> construct plans -> generate code. 3 plans default, max 7.
div_sampling.py
1B
12 perturbations: 4 roles + 4 instructions + 4 styles. Modular selection by candidate index.
budget_forcing.py
1C
5 tiers (nothink/light/standard/hard/extreme). Wait injection on premature thinking termination. Energy-to-tier sigmoid mapping.
blend_asc.py
2A
Adaptive K from C(x) energy: 4 bands mapping energy to k=1-12 and budget tier.
reasc.py
2B
Early stopping: energy < 0.10 AND bottom-10% logprob confidence > -0.5.
s_star.py
2C
Tiebreaking: generate edge-case inputs where candidates differ, sandbox both, majority wins.
candidate_selection.py
—
4 strategies: lens (min energy), random, logprob (max mean), oracle (first pass).
failure_analysis.py
3A
Categorize failures: wrong_algorithm, implementation_bug, edge_case_miss, time_limit, format_error, partial_correct.
constraint_refinement.py
3B
Generate refined hypotheses from failure analysis. Cosine distance >= 0.15 prevents repetition.
pr_cot.py
3C
4 perspectives (logical_consistency, information_completeness, biases, alternative_solutions) x (analysis + repair) = 8 LLM calls.
derivation_chains.py
3D
Decompose into <= 5 sub-problems, sandbox-verify each, compose final. 7+ LLM calls.
refinement_loop.py
3E
Orchestrator: FailureAnalysis -> ConstraintRefiner -> CodeGen -> Test -> Learn. 2 iters, 120s budget.
metacognitive.py
3F
Model failure pattern library with frequency tracking, compensation injection, effectiveness monitoring.
ace_pipeline.py
3G
Evolving playbooks: Generator-Reflector-Curator pipeline with confidence decay.
self_test_gen.py
util
Generate test cases from problem description. Multiple parsing fallbacks. 50% majority threshold.
lens_feedback.py
util
Online Lens recalibration: collect pass/fail embeddings, trigger retrain at 50-sample intervals.
embedding_store.py
util
Binary append-only embedding storage: task_id + candidate_index + label + 4096-dim float32 vector.
ablation_analysis.py
util
Bootstrap significance tests, pass rate computation across ablation conditions.
geometric-lens/ — Core Service
File
Description
main.py
FastAPI server: 26 endpoints for scoring, indexing, routing, caching, pattern management
pipeline.py
RAG orchestrator: retrieve chunks + patterns -> collect signals -> estimate difficulty -> route -> generate -> verify
config.py
ServerConfig (port 8001), Redis URL, API keys, YAML config loading
storage.py
ProjectMetadata CRUD for indexed projects
verify_loop.py
Verify-repair loop with retry and escalation
sandbox_client.py
HTTP client for sandbox code execution
sandbox_analysis.py
Classify sandbox execution results
requirements.txt
Dependencies: FastAPI, uvicorn, torch (CPU), pydantic, redis, tree-sitter
Dockerfile
Python 3.11-slim, CPU PyTorch, port 8099
geometric-lens/geometric_lens/ — Scoring Models
File
Description
cost_field.py
C(x): 4096->512->128->1 MLP (SiLU + Softplus). 2.16M params. Contrastive ranking loss.
metric_tensor.py
G(x): PCA(4096->128) + diagonal metric tensor + input-dependent modulation. Code exists, not deployed.
service.py
Service layer: lazy model loading, evaluate_combined() (single embedding for C(x)+G(x)), verdict thresholds, hot-reload
training.py
train_cost_field() (200 epochs), retrain_cost_field_bce() (production retrain with class weights, early stopping)
embedding_extractor.py
Calls llama-server POST /v1/embeddings, handles pooled and per-token responses, mean pooling
ewc.py
Elastic Weight Consolidation: Fisher Information Matrix, penalty term, prevents catastrophic forgetting
correction.py
Natural gradient correction: -alpha * G_inv * grad_C. PCA projection/unprojection. Correctability score.
replay_buffer.py
Domain-stratified reservoir sampling. 30% old / 70% new training mix. JSON persistence.
geometric-lens/indexer/ — RAG Indexing
File
Description
ast_parser.py
tree-sitter Python AST parsing: classes, functions, imports, top-level blocks. Fallback regex parser.
tree_builder.py
Build hierarchical TreeIndex from parsed files. Supports incremental updates.
bm25_index.py
Inverted index with BM25 scoring (k1=1.5, b=0.75). CamelCase/snake_case tokenization.
summarizer.py
LLM-generated summaries for tree nodes.
persistence.py
Save/load TreeIndex + BM25Index as JSON to disk.
geometric-lens/retriever/ — RAG Retrieval
File
Description
bm25_search.py
BM25 keyword search: min_score=0.1, top_k=20. Strong match detection (threshold=3.0).
tree_search.py
LLM-guided tree traversal: max_depth=6, max_reasoning_calls=40. Scores children 0-10.
hybrid.py
Routes between bm25_first, tree_only, and both strategies. Deduplication + score normalization.
geometric-lens/router/ — Confidence Router
File
Description
route_selector.py
Thompson Sampling with Beta(alpha,beta) posteriors. 4 routes: CACHE_HIT(1) -> FAST_PATH(50) -> STANDARD(300) -> HARD_PATH(1500).
difficulty_estimator.py
Weighted fusion of 4 signals -> D(x). Adjusts weights when Geometric Lens is available.
signal_collector.py
Collects: pattern_cache_score, retrieval_confidence, query_complexity, geometric_energy, gx_score.
feedback_recorder.py
Records route outcomes to Redis for Thompson Sampling posterior updates.
fallback_chain.py
Retry escalation: CACHE_HIT -> FAST_PATH -> STANDARD -> HARD_PATH -> terminal.
geometric-lens/cache/ — Pattern Cache
File
Description
pattern_store.py
Redis-backed storage: STM (100 max), LTM, PERSISTENT tiers. Sorted set management.
pattern_matcher.py
BM25 index over pattern summaries. Normalized [0,1] similarity scores.
pattern_extractor.py
LLM-driven extraction of reusable patterns from successful task solutions.
pattern_scorer.py
Ebbinghaus decay: recency-weighted composite score for STM/LTM promotion.
co_occurrence.py
Tracks patterns used together. Graph traversal for linked pattern retrieval.
consolidator.py
Category surprise tracking for pattern novelty assessment.
seed_patterns.py
Bootstrap patterns for initial cache population.
v3-service/ — V3 Pipeline HTTP Wrapper
File
Description
main.py
HTTP server (port 8070). Pipeline orchestrator: Phase 0 (probe) -> Phase 2 (allocate K) -> Phase 1 (generate) -> Selection -> Phase 3 (repair). LLMAdapter, EmbedAdapter, SandboxAdapter, BuildVerifier. Imports all 19 V3 modules.
Dockerfile
Python 3.11, CPU PyTorch, copies benchmark/ for V3 module access. Port 8070.
sandbox/ — Isolated Code Execution
File
Description
executor_server.py
FastAPI server (port 8020). 8 language executors with compilation, pytest/pylint for Python, syntax checking, error classification (15 types), output truncation.
Dockerfile
Python 3.11-slim + Node.js 20 + Go 1.22 + Rust stable + gcc/g++. tmpfs workspace, read-only root.
inference/ — llama-server Configuration
File
Description
validate_tests.py
Test runner entry point
conftest.py
Pytest shared fixtures
infrastructure/
test_llm.py
llama-server health and generation tests
test_sandbox.py
Sandbox execution tests
integration/
test_e2e_flow.py
End-to-end pipeline flow test
test_e2e_training.py
End-to-end Lens training test
v3/ — 22 unit tests, one per V3 module
test_plan_search.py test_div_sampling.py test_budget_forcing.py test_blend_asc.py test_reasc.py test_s_star.py test_candidate_selection.py test_failure_analysis.py test_constraint_refinement.py test_pr_cot.py test_derivation_chains.py test_refinement_loop.py test_metacognitive.py test_ace_pipeline.py test_self_test_gen.py test_lens_feedback.py test_embedding_store.py test_ablation_analysis.py test_ewc.py test_replay_buffer.py test_enhanced_retrain.py test_phase4_validation.py test_sandbox_adapter.py
File
Description
ARCHITECTURE.md
Two-layer architecture with 13 Mermaid diagrams, component breakdowns, sequence diagrams
API.md
HTTP API reference: all endpoints for all 5 services, request/response formats
CLI.md
CLI usage, streaming output format, workflow examples, troubleshooting
CONFIGURATION.md
Every environment variable across all services, internal constants, Aider config
MAP.md
This file — repository file map
SETUP.md
Installation: Docker Compose, bare-metal, K3s
TROUBLESHOOTING.md
Common issues and solutions
docs/reports/ — Studies, Status, Migration
v3_ablation_results/ — Published Evidence
Per-task pass/fail data for all V3 ablation conditions. 2,396 task results across 4 conditions. See README for data format.
Condition
Directory
Pass@1
Tasks
A (baseline)
condition_a_baseline/
54.9%
599
B (+Phase 1)
condition_b_phase1/
67.3%
599
C (+Phase 1+2)
condition_c_phase1_2/
67.3%
599
D (+Phase 1+3)
condition_d_phase1_3/
74.6%
599
Each condition contains summary.json, v3_lcb/results.json, and 599 per-task JSON files in v3_lcb/per_task/.