ATLAS Repository Map

Every file in the repository. Click any directory in the tree to jump to its description table.

File Tree

.aider.model.metadata.json — Aider model token limits and cost
.aider.model.settings.yml — Aider model behavior settings
.env.example — Docker Compose environment template
.gitignore — Git ignore rules
atlas.conf.example — K3s deployment configuration template
docker-compose.yml — 5-service Docker Compose stack
pyproject.toml — Python package definition (atlas CLI entry point)
LICENSE — GNU Affero General Public License v3.0 (AGPL-3.0)
README.md — Project overview, benchmarks, setup
CHANGELOG.md — Release history
CODE_OF_CONDUCT.md — Community guidelines
CONTRIBUTING.md — Contributor guide
atlas-proxy/ — Go proxy: agent loop, grammar, tool calls
- main.go — HTTP server, chat handler, verify-repair, tier classification
- agent.go — Agent loop, LLM dispatch, exploration budget, error recovery
- tools.go — 8 tool definitions + executors, tier classifier
- aider_format.go — Agent results to Aider whole-file format
- grammar.go — JSON schema + GBNF grammar generation
- types.go — Shared types: ToolCall, AgentContext, tiers
- v3_bridge.go — Go-to-Python V3 service SSE bridge
- v3_adapter.go — File requests to V3 pipeline format
- build_verify.go — Per-language build verification commands
- project.go — Language/framework detection
- permissions.go — Permission rules and deny patterns
- parallel.go — plan_tasks executor with dependency graph
- go.mod — Go module definition
- Dockerfile — Multi-stage Go build
- README.md — Proxy documentation
- atlas-proxy — Compiled Go binary (gitignored in production)
atlas/ — Python CLI package
- __init__.py
- cli/
  - repl.py — Interactive REPL entry point
  - client.py — HTTP client for llama-server, Lens, sandbox
  - display.py — Terminal output formatting and colors
  - __init__.py, __main__.py
  - commands/
    - solve.py — /solve command: generate + score + test
    - bench.py — /bench command: run V3 benchmarks
    - status.py — /status command: service health checks
    - __init__.py
benchmark/ — Benchmark runner and datasets
- runner.py — Code execution, LLM API calls, ChatML formatting
- v2_runner.py — V2 benchmark runner (phases 0-6, telemetry)
- v3_runner.py — V3 benchmark runner entry point
- v2_report.py — Markdown report generator
- cli.py — CLI entry point (atlas benchmark)
- config.py — BenchmarkConfig from atlas.conf
- models.py — BenchmarkTask, AttemptResult, TaskResult dataclasses
- best_of_k.py — Best-of-K candidate evaluation
- geo_learning.py — Geometric learning integration
- run_v2_benchmark.sh — V2 benchmark launch script
- measure_bok_latency.sh — Best-of-K latency measurement
- README.md — Benchmark documentation
- datasets/
  - base.py — Abstract BaseDataset class
  - humaneval.py — HumanEval (164 tasks, function completion)
  - mbpp.py — MBPP (500 tasks, 3-shot format)
  - evalplus_humaneval.py — HumanEval+ (EvalPlus augmented)
  - evalplus_mbpp.py — MBPP+ (EvalPlus augmented)
  - livecodebench.py — LiveCodeBench v5 (599 tasks, stdio)
  - gpqa.py — GPQA Diamond (198 MCQ)
  - ifbench.py — IFBench (300 instruction-following)
  - scicode.py — SciCode (~80 scientific coding)
  - __init__.py
- analysis/
  - cost_analysis.py — Cost/token analysis
  - hardware_info.py — GPU/CPU detection
  - pass_at_k.py — pass@k metric calculation
  - __init__.py
- custom/
  - tasks.json — 100 custom benchmark tasks
  - tasks.json.lock — Task lock file
  - validate.py — Custom task validation
  - __init__.py
- v3/ — V3 pipeline modules (19 files)
  - plan_search.py — PlanSearch (1A): 3 constraint-based plans
  - div_sampling.py — DivSampling (1B): 12 perturbations
  - budget_forcing.py — BudgetForcing (1C): 5 tiers, Wait injection
  - blend_asc.py — BlendASC (2A): adaptive K allocation
  - reasc.py — ReASC (2B): early stopping
  - s_star.py — S* (2C): differential tiebreaking
  - candidate_selection.py — 4 selection strategies
  - failure_analysis.py — FailureAnalysis (3A): 6 failure categories
  - constraint_refinement.py — ConstraintRefiner (3B): cosine filtering
  - pr_cot.py — PR-CoT (3C): 4-perspective repair
  - derivation_chains.py — DerivationChains (3D): sub-problem decomposition
  - refinement_loop.py — RefinementLoop (3E): orchestrator
  - metacognitive.py — Metacognitive (3F): failure pattern library
  - ace_pipeline.py — ACE (3G): playbook learning
  - self_test_gen.py — Model-generated test cases
  - lens_feedback.py — Online Lens recalibration
  - embedding_store.py — Binary embedding persistence
  - ablation_analysis.py — Statistical ablation analysis
  - __init__.py
geometric-lens/ — Scoring, RAG, routing, pattern cache
- main.py — FastAPI server (26 endpoints)
- pipeline.py — RAG pipeline orchestrator
- config.py — Server/Redis/API configuration
- storage.py — Project metadata CRUD
- verify_loop.py — Verify-repair loop logic
- sandbox_client.py — Sandbox HTTP client
- sandbox_analysis.py — Sandbox result analysis
- requirements.txt — Python dependencies (CPU PyTorch)
- Dockerfile — Container build (port 8099)
- .dockerignore
- geometric_lens/ — Scoring models
  - cost_field.py — C(x): 4096->512->128->1 MLP
  - metric_tensor.py — G(x): diagonal metric tensor + PCA
  - service.py — Service layer: evaluate_combined(), scoring API
  - training.py — Training pipeline: contrastive loss, retraining
  - embedding_extractor.py — llama-server /v1/embeddings client
  - ewc.py — Elastic Weight Consolidation
  - correction.py — Natural gradient correction engine
  - replay_buffer.py — Domain-stratified experience replay
  - __init__.py
- indexer/ — RAG indexing
  - ast_parser.py — tree-sitter AST parsing
  - tree_builder.py — Hierarchical code index
  - bm25_index.py — BM25 inverted index
  - summarizer.py — LLM-generated node summaries
  - persistence.py — JSON index persistence
  - __init__.py
- retriever/ — RAG retrieval
  - bm25_search.py — BM25 keyword search
  - tree_search.py — LLM-guided tree traversal
  - hybrid.py — Hybrid retriever (routes bm25/tree/both)
  - __init__.py
- router/ — Confidence routing
  - route_selector.py — Thompson Sampling route selection
  - difficulty_estimator.py — 4-signal difficulty fusion
  - signal_collector.py — Signal collection
  - feedback_recorder.py — Redis-backed outcome recording
  - fallback_chain.py — Route escalation chain
  - __init__.py
- cache/ — Pattern cache
  - pattern_store.py — Redis STM/LTM/PERSISTENT storage
  - pattern_matcher.py — BM25 pattern matching
  - pattern_extractor.py — LLM-driven pattern extraction
  - pattern_scorer.py — Ebbinghaus decay scoring
  - co_occurrence.py — Co-occurrence graph
  - consolidator.py — Category surprise tracking
  - seed_patterns.py — Bootstrap seed patterns
  - __init__.py
v3-service/ — V3 pipeline HTTP wrapper
- main.py — HTTP server, pipeline orchestrator, LLM/Lens/Sandbox adapters
- Dockerfile — Container build (CPU PyTorch, port 8070)
sandbox/ — Isolated code execution
- executor_server.py — FastAPI server, 8 language executors, linting, error classification
- Dockerfile — Container build (Python, Node, Go, Rust, gcc)
inference/ — llama-server configuration
- Dockerfile.v31 — V3.1 9B model build (used by docker-compose)
- Dockerfile — Base llama.cpp build
- Dockerfile.mtp — Multi-Token Prediction experimental build
- entrypoint-v3.1-9b.sh — K3s 9B entrypoint (flash-attn, mlock, 4 slots)
- entrypoint-v3-specdec.sh — K3s 14B + spec decode entrypoint
- entrypoint.sh — Default entrypoint
- entrypoint-embed.sh — Dedicated embedding server entrypoint
- entrypoint-mtp.sh — MTP experimental entrypoint
- patches/fix-embeddings-spec-decode.patch — Fix for embeddings + spec decode conflict
- templates/Qwen3-custom.jinja — Custom Qwen3 chat template
- templates/Qwen3-no-think.jinja — Qwen3 template with thinking suppressed
scripts/ — Build, deploy, and training automation
- install.sh — K3s + GPU Operator installation
- uninstall.sh — K3s teardown
- build-containers.sh — Build all container images
- deploy-9b.sh — Deploy 9B model to K3s
- generate-manifests.sh — K3s manifests from atlas.conf
- download-models.sh — Download model weights
- verify-install.sh — Post-install verification
- smoke-test-9b.sh — Quick 9B deployment smoke test
- run_full_benchmarks.sh — Run all benchmark suites
- run_v31_ablation.sh — V3.1 ablation study launcher
- validate_benchmarks.py — Benchmark result validation
- derive_ablation.py — Derive ablation conditions from runs
- retrain_cx.py — Retrain C(x) cost field
- retrain_cx_phase0.py — Phase 0 C(x) training
- retrain_lens_from_results.py — Retrain Lens from benchmark results
- collect_lens_training_data.py — Collect embeddings for training
- prepare_lens_training.py — Prepare training data
- lib/config.sh — Shared bash config loader
tests/ — Test suite
- validate_tests.py — Test runner entry point
- conftest.py — Pytest shared fixtures
- infrastructure/
  - test_llm.py — llama-server connectivity tests
  - test_sandbox.py — Sandbox connectivity tests
- integration/
  - test_e2e_flow.py — End-to-end pipeline flow test
  - test_e2e_training.py — End-to-end training test
- v3/ — V3 module unit tests (22 files)
  - test_plan_search.py, test_div_sampling.py, test_budget_forcing.py, test_blend_asc.py, test_reasc.py, test_s_star.py, test_candidate_selection.py, test_failure_analysis.py, test_constraint_refinement.py, test_pr_cot.py, test_derivation_chains.py, test_refinement_loop.py, test_metacognitive.py, test_ace_pipeline.py, test_self_test_gen.py, test_lens_feedback.py, test_embedding_store.py, test_ablation_analysis.py, test_ewc.py, test_replay_buffer.py, test_enhanced_retrain.py, test_phase4_validation.py, test_sandbox_adapter.py
docs/ — Documentation
- ARCHITECTURE.md — Two-layer architecture, component diagrams, data flow
- API.md — HTTP API reference for all 5 services
- CLI.md — CLI usage, streaming output, troubleshooting
- CONFIGURATION.md — All environment variables and settings
- MAP.md — This file
- SETUP.md — Installation guide (Docker, bare-metal, K3s)
- TROUBLESHOOTING.md — Common issues and solutions
- reports/ — Ablation studies, status tracking, migration guides
  - V3_ABLATION_STUDY.md — V3 ablation methodology and results
  - V2_5_ABLATION_STUDY.md — V2.5 Geometric Lens ablation (historical)
  - V2_TO_V2_5_MIGRATION.md — V2 to V2.5 migration guide (historical)
  - V3_STATUS.md — V3 implementation status (historical)
  - V3_1_STATUS.md — V3.1 implementation status
- images/banner.png — README banner image
- images/ATLAS_CLI.png — CLI screenshot
v3_ablation_results/ — Published ablation data
- README.md — Data format documentation
- config.json — Ablation run configuration
- preflight.json — Pre-run system checks
- condition_a_baseline/ — Baseline (54.9%, 599 tasks)
- condition_b_phase1/ — +Phase 1 (67.3%, 599 tasks)
- condition_c_phase1_2/ — +Phase 1+2 (67.3%, 599 tasks)
- condition_d_phase1_3/ — +Phase 1+3 (74.6%, 599 tasks)
- Each condition contains summary.json, v3_lcb/results.json, and v3_lcb/per_task/ (599 per-task JSON files)

Description Tables

Root — Configuration

File	Description
`.aider.model.metadata.json`	Aider model metadata: token limits (32K), cost ($0 — local), provider (openai)
`.aider.model.settings.yml`	Aider behavior: whole-file edit format, repo map enabled, streaming on, temperature 0.3
`.env.example`	Docker Compose env template: model path, ports (8080/8099/8070/30820/8090), context size
`atlas.conf.example`	K3s deployment config: model, GPU layers, parallel slots, NodePorts, namespace
`docker-compose.yml`	5-service stack: llama-server, geometric-lens, v3-service, sandbox, atlas-proxy
`pyproject.toml`	Python package: `atlas` CLI entry point (`atlas.cli.repl:run`), requires Python >= 3.9
`.gitignore`	Ignores: model weights, pycache, .aider* (except config files), logs, .env

Root — Documentation

File	Description
`README.md`	Project overview, 74.6% LCB benchmark, setup instructions, hardware requirements
`CHANGELOG.md`	Release history: V3.0.1 (2026-04-05), V3.0, V2.5, V2
`LICENSE`	GNU Affero General Public License v3.0 (AGPL-3.0)
`CODE_OF_CONDUCT.md`	Contributor Covenant Code of Conduct
`CONTRIBUTING.md`	How to contribute: fork, branch, test, PR workflow

atlas-proxy/ — Agent Loop (Go)

The core of the V3.0.1 CLI. Receives OpenAI-compatible requests from Aider, runs a grammar-constrained agent loop with 8 tools, and routes complex files through the V3 pipeline.

File	Lines	Description
`main.go`	2890	HTTP server, `/v1/chat/completions` handler, verify-repair pipeline, best-of-K, format normalization, error analysis, Lens scoring, sandbox testing
`agent.go`	740	Agent loop iteration, JSON schema generation, system prompt building, LLM calls with grammar constraint, exploration budget, truncation recovery
`tools.go`	905	8 tool definitions (read/write/edit/delete file, run command, search, list dir, plan tasks), per-file tier classifier, V3 routing
`aider_format.go`	697	Converts agent results to Aider whole-file blocks, streams real-time status with icons, project directory detection, delete fast-path
`grammar.go`	192	JSON schema (oneOf: tool_call/text/done) and GBNF grammar for constrained output, tool documentation generation
`types.go`	390	AgentContext, ToolDef, ToolResult, tier definitions (T0-T3), max turns per tier, permission types
`v3_bridge.go`	120	HTTP bridge to Python V3 service with SSE progress streaming, Lens scoring bridge
`v3_adapter.go`	177	Translates file write requests into V3GenerateRequest with project context, framework detection, constraint extraction
`build_verify.go`	157	Per-file-type verification: tsc, py_compile, go build, cargo check, gcc, bash -n. Framework-specific overrides
`project.go`	226	Detects language (Node/Python/Rust/Go/C/Shell), framework (Next.js/Flask/Express), build/dev/test commands
`permissions.go`	150	Allow/deny rules, dangerous pattern detection (rm -rf, .env, credentials), mode-based access
`parallel.go`	213	plan_tasks executor: topological sort, concurrent sub-task execution (15-turn budget each)
`go.mod`	—	Go module definition
`Dockerfile`	—	Multi-stage Go build for containerized deployment

atlas/ — Python CLI

Standalone REPL for direct interaction with ATLAS services (without Aider).

File	Description
`cli/repl.py`	Main entry point (`atlas` command). Interactive REPL with /solve, /bench, /status, /help. Pipe mode support.
`cli/client.py`	HTTP client for llama-server, Geometric Lens, sandbox. Health checks, generation (batch + streaming), scoring, sandbox execution.
`cli/display.py`	Terminal formatting: banner, colors, status blocks, prompts, separators
`cli/commands/solve.py`	/solve: generate code from LLM, extract from think blocks, score via Lens, test via sandbox
`cli/commands/bench.py`	/bench: delegates to benchmark.v3_runner with dataset/strategy/task-count args
`cli/commands/status.py`	/status: check health of llama-server, Lens, sandbox

benchmark/ — Benchmark Infrastructure

Runner infrastructure for evaluating LLM code generation across multiple datasets.

File	Description
`runner.py`	Core execution: function mode + stdio mode, LLM API calls, ChatML formatting, code extraction
`v2_runner.py`	V2 benchmark runner: phases 0-6, telemetry, Mode A/B, crash recovery
`v3_runner.py`	V3 benchmark runner: full pipeline with ablation conditions A-F
`v2_report.py`	Markdown report generator from benchmark results
`cli.py`	CLI entry point: `atlas benchmark --humaneval --dry-run` etc.
`config.py`	BenchmarkConfig loaded from atlas.conf
`models.py`	Data models: BenchmarkTask, AttemptResult, TaskResult, BenchmarkRun
`best_of_k.py`	Best-of-K candidate evaluation with scoring
`geo_learning.py`	Geometric learning integration for benchmarks

benchmark/datasets/ — Dataset Loaders

Each loader downloads from HuggingFace (JSON rows API, no pyarrow) and normalizes to BenchmarkTask format.

File	Tasks	Eval Mode	Description
`base.py`	—	—	Abstract BaseDataset class with download, parse, validate
`humaneval.py`	164	function	HumanEval function completion
`mbpp.py`	500	function	MBPP with 3-shot [BEGIN]/[DONE] format
`evalplus_humaneval.py`	164	function	HumanEval+ (EvalPlus augmented tests)
`evalplus_mbpp.py`	500	function	MBPP+ (EvalPlus augmented tests)
`livecodebench.py`	599	stdio	LiveCodeBench v5 from bzantium mirror
`gpqa.py`	198	mcq	GPQA Diamond from OpenAI blob CSV
`ifbench.py`	300	ifbench	IFBench instruction-following with loose eval
`scicode.py`	~80	function	SciCode cross-domain scientific coding

benchmark/analysis/ — Analysis Utilities

File	Description
`cost_analysis.py`	Token cost and electricity cost analysis
`hardware_info.py`	GPU/CPU detection and reporting
`pass_at_k.py`	pass@k metric calculation

benchmark/custom/ — Custom Tasks

File	Description
`tasks.json`	100 custom benchmark tasks
`validate.py`	Validates custom task format

benchmark/v3/ — V3 Pipeline Modules

19 Python modules implementing the V3 code generation pipeline. Each module follows a Config + Event + Controller pattern.

Module	Phase	Description
`plan_search.py`	1A	3-step pipeline: extract constraints -> construct plans -> generate code. 3 plans default, max 7.
`div_sampling.py`	1B	12 perturbations: 4 roles + 4 instructions + 4 styles. Modular selection by candidate index.
`budget_forcing.py`	1C	5 tiers (nothink/light/standard/hard/extreme). Wait injection on premature thinking termination. Energy-to-tier sigmoid mapping.
`blend_asc.py`	2A	Adaptive K from C(x) energy: 4 bands mapping energy to k=1-12 and budget tier.
`reasc.py`	2B	Early stopping: energy < 0.10 AND bottom-10% logprob confidence > -0.5.
`s_star.py`	2C	Tiebreaking: generate edge-case inputs where candidates differ, sandbox both, majority wins.
`candidate_selection.py`	—	4 strategies: lens (min energy), random, logprob (max mean), oracle (first pass).
`failure_analysis.py`	3A	Categorize failures: wrong_algorithm, implementation_bug, edge_case_miss, time_limit, format_error, partial_correct.
`constraint_refinement.py`	3B	Generate refined hypotheses from failure analysis. Cosine distance >= 0.15 prevents repetition.
`pr_cot.py`	3C	4 perspectives (logical_consistency, information_completeness, biases, alternative_solutions) x (analysis + repair) = 8 LLM calls.
`derivation_chains.py`	3D	Decompose into <= 5 sub-problems, sandbox-verify each, compose final. 7+ LLM calls.
`refinement_loop.py`	3E	Orchestrator: FailureAnalysis -> ConstraintRefiner -> CodeGen -> Test -> Learn. 2 iters, 120s budget.
`metacognitive.py`	3F	Model failure pattern library with frequency tracking, compensation injection, effectiveness monitoring.
`ace_pipeline.py`	3G	Evolving playbooks: Generator-Reflector-Curator pipeline with confidence decay.
`self_test_gen.py`	util	Generate test cases from problem description. Multiple parsing fallbacks. 50% majority threshold.
`lens_feedback.py`	util	Online Lens recalibration: collect pass/fail embeddings, trigger retrain at 50-sample intervals.
`embedding_store.py`	util	Binary append-only embedding storage: task_id + candidate_index + label + 4096-dim float32 vector.
`ablation_analysis.py`	util	Bootstrap significance tests, pass rate computation across ablation conditions.

geometric-lens/ — Core Service

File	Description
`main.py`	FastAPI server: 26 endpoints for scoring, indexing, routing, caching, pattern management
`pipeline.py`	RAG orchestrator: retrieve chunks + patterns -> collect signals -> estimate difficulty -> route -> generate -> verify
`config.py`	ServerConfig (port 8001), Redis URL, API keys, YAML config loading
`storage.py`	ProjectMetadata CRUD for indexed projects
`verify_loop.py`	Verify-repair loop with retry and escalation
`sandbox_client.py`	HTTP client for sandbox code execution
`sandbox_analysis.py`	Classify sandbox execution results
`requirements.txt`	Dependencies: FastAPI, uvicorn, torch (CPU), pydantic, redis, tree-sitter
`Dockerfile`	Python 3.11-slim, CPU PyTorch, port 8099

geometric-lens/geometric_lens/ — Scoring Models

File	Description
`cost_field.py`	C(x): 4096->512->128->1 MLP (SiLU + Softplus). 2.16M params. Contrastive ranking loss.
`metric_tensor.py`	G(x): PCA(4096->128) + diagonal metric tensor + input-dependent modulation. Code exists, not deployed.
`service.py`	Service layer: lazy model loading, evaluate_combined() (single embedding for C(x)+G(x)), verdict thresholds, hot-reload
`training.py`	train_cost_field() (200 epochs), retrain_cost_field_bce() (production retrain with class weights, early stopping)
`embedding_extractor.py`	Calls llama-server POST /v1/embeddings, handles pooled and per-token responses, mean pooling
`ewc.py`	Elastic Weight Consolidation: Fisher Information Matrix, penalty term, prevents catastrophic forgetting
`correction.py`	Natural gradient correction: -alpha * G_inv * grad_C. PCA projection/unprojection. Correctability score.
`replay_buffer.py`	Domain-stratified reservoir sampling. 30% old / 70% new training mix. JSON persistence.

geometric-lens/indexer/ — RAG Indexing

File	Description
`ast_parser.py`	tree-sitter Python AST parsing: classes, functions, imports, top-level blocks. Fallback regex parser.
`tree_builder.py`	Build hierarchical TreeIndex from parsed files. Supports incremental updates.
`bm25_index.py`	Inverted index with BM25 scoring (k1=1.5, b=0.75). CamelCase/snake_case tokenization.
`summarizer.py`	LLM-generated summaries for tree nodes.
`persistence.py`	Save/load TreeIndex + BM25Index as JSON to disk.

geometric-lens/retriever/ — RAG Retrieval

File	Description
`bm25_search.py`	BM25 keyword search: min_score=0.1, top_k=20. Strong match detection (threshold=3.0).
`tree_search.py`	LLM-guided tree traversal: max_depth=6, max_reasoning_calls=40. Scores children 0-10.
`hybrid.py`	Routes between bm25_first, tree_only, and both strategies. Deduplication + score normalization.

geometric-lens/router/ — Confidence Router

File	Description
`route_selector.py`	Thompson Sampling with Beta(alpha,beta) posteriors. 4 routes: CACHE_HIT(1) -> FAST_PATH(50) -> STANDARD(300) -> HARD_PATH(1500).
`difficulty_estimator.py`	Weighted fusion of 4 signals -> D(x). Adjusts weights when Geometric Lens is available.
`signal_collector.py`	Collects: pattern_cache_score, retrieval_confidence, query_complexity, geometric_energy, gx_score.
`feedback_recorder.py`	Records route outcomes to Redis for Thompson Sampling posterior updates.
`fallback_chain.py`	Retry escalation: CACHE_HIT -> FAST_PATH -> STANDARD -> HARD_PATH -> terminal.

geometric-lens/cache/ — Pattern Cache

File	Description
`pattern_store.py`	Redis-backed storage: STM (100 max), LTM, PERSISTENT tiers. Sorted set management.
`pattern_matcher.py`	BM25 index over pattern summaries. Normalized [0,1] similarity scores.
`pattern_extractor.py`	LLM-driven extraction of reusable patterns from successful task solutions.
`pattern_scorer.py`	Ebbinghaus decay: recency-weighted composite score for STM/LTM promotion.
`co_occurrence.py`	Tracks patterns used together. Graph traversal for linked pattern retrieval.
`consolidator.py`	Category surprise tracking for pattern novelty assessment.
`seed_patterns.py`	Bootstrap patterns for initial cache population.

v3-service/ — V3 Pipeline HTTP Wrapper

File	Description
`main.py`	HTTP server (port 8070). Pipeline orchestrator: Phase 0 (probe) -> Phase 2 (allocate K) -> Phase 1 (generate) -> Selection -> Phase 3 (repair). LLMAdapter, EmbedAdapter, SandboxAdapter, BuildVerifier. Imports all 19 V3 modules.
`Dockerfile`	Python 3.11, CPU PyTorch, copies benchmark/ for V3 module access. Port 8070.

sandbox/ — Isolated Code Execution

File	Description
`executor_server.py`	FastAPI server (port 8020). 8 language executors with compilation, pytest/pylint for Python, syntax checking, error classification (15 types), output truncation.
`Dockerfile`	Python 3.11-slim + Node.js 20 + Go 1.22 + Rust stable + gcc/g++. tmpfs workspace, read-only root.

inference/ — llama-server Configuration

File	Description
`Dockerfile.v31`	V3.1 9B model Docker build. Used by docker-compose. Builds llama.cpp from source with CUDA.
`Dockerfile`	Base llama.cpp build with CUDA support.
`Dockerfile.mtp`	Multi-Token Prediction experimental build.
`entrypoint-v3.1-9b.sh`	K3s 9B production entrypoint: flash-attn, mlock, --parallel 4, KV quant (q8_0/q4_0), embeddings, 160K context.
`entrypoint-v3-specdec.sh`	K3s 14B + spec decode entrypoint: Qwen3-14B main + Qwen3-0.6B draft, embeddings patch.
`entrypoint.sh`	Default entrypoint: basic llama-server launch with configurable flags.
`entrypoint-embed.sh`	Dedicated embedding server entrypoint (nomic-embed-text-v1.5).
`entrypoint-mtp.sh`	MTP experimental entrypoint.
`patches/fix-embeddings-spec-decode.patch`	One-line patch: prevents embedding=true from poisoning draft model context in spec decode.
`templates/Qwen3-custom.jinja`	Custom Qwen3 Jinja2 chat template.
`templates/Qwen3-no-think.jinja`	Qwen3 template that suppresses `<think>` blocks.

scripts/ — Automation

File	Description
`install.sh`	Full K3s installation: prerequisites, GPU Operator, namespace, image build, manifest deployment
`uninstall.sh`	K3s teardown and cleanup
`build-containers.sh`	Build all container images and import to K3s
`deploy-9b.sh`	Deploy Qwen3.5-9B to K3s cluster
`generate-manifests.sh`	Generate K3s manifests from atlas.conf via envsubst
`download-models.sh`	Download model weights from HuggingFace
`verify-install.sh`	Post-install health verification
`smoke-test-9b.sh`	Quick smoke test for 9B deployment
`run_full_benchmarks.sh`	Run all benchmark suites sequentially
`run_v31_ablation.sh`	V3.1 ablation study launcher with conditions A-F
`validate_benchmarks.py`	Validate benchmark results for completeness
`derive_ablation.py`	Derive ablation conditions from raw benchmark runs
`retrain_cx.py`	Retrain C(x) cost field from collected embeddings
`retrain_cx_phase0.py`	Phase 0 C(x) initial training (597 embeddings)
`retrain_lens_from_results.py`	Retrain Lens models from benchmark result embeddings
`collect_lens_training_data.py`	Collect pass/fail embeddings from benchmark runs
`prepare_lens_training.py`	Prepare and validate training data format
`lib/config.sh`	Shared bash config: loads atlas.conf, validates paths, sets defaults

tests/ — Test Suite

File	Description
`validate_tests.py`	Test runner entry point
`conftest.py`	Pytest shared fixtures
infrastructure/
`test_llm.py`	llama-server health and generation tests
`test_sandbox.py`	Sandbox execution tests
integration/
`test_e2e_flow.py`	End-to-end pipeline flow test
`test_e2e_training.py`	End-to-end Lens training test
v3/ — 22 unit tests, one per V3 module
`test_plan_search.py` `test_div_sampling.py` `test_budget_forcing.py` `test_blend_asc.py` `test_reasc.py` `test_s_star.py` `test_candidate_selection.py` `test_failure_analysis.py` `test_constraint_refinement.py` `test_pr_cot.py` `test_derivation_chains.py` `test_refinement_loop.py` `test_metacognitive.py` `test_ace_pipeline.py` `test_self_test_gen.py` `test_lens_feedback.py` `test_embedding_store.py` `test_ablation_analysis.py` `test_ewc.py` `test_replay_buffer.py` `test_enhanced_retrain.py` `test_phase4_validation.py` `test_sandbox_adapter.py`

docs/ — Documentation

File	Description
`ARCHITECTURE.md`	Two-layer architecture with 13 Mermaid diagrams, component breakdowns, sequence diagrams
`API.md`	HTTP API reference: all endpoints for all 5 services, request/response formats
`CLI.md`	CLI usage, streaming output format, workflow examples, troubleshooting
`CONFIGURATION.md`	Every environment variable across all services, internal constants, Aider config
`MAP.md`	This file — repository file map
`SETUP.md`	Installation: Docker Compose, bare-metal, K3s
`TROUBLESHOOTING.md`	Common issues and solutions

docs/reports/ — Studies, Status, Migration

File	Description
`V3_ABLATION_STUDY.md`	V3 ablation methodology: conditions A-D, 599 tasks, statistical analysis
`V2_5_ABLATION_STUDY.md`	Historical: V2.5 Geometric Lens ablation study
`V2_TO_V2_5_MIGRATION.md`	Historical: V2 to V2.5 migration guide
`V3_STATUS.md`	Historical: V3 implementation tracking
`V3_1_STATUS.md`	V3.1 implementation status and roadmap

v3_ablation_results/ — Published Evidence

Per-task pass/fail data for all V3 ablation conditions. 2,396 task results across 4 conditions. See README for data format.

Condition	Directory	Pass@1	Tasks
A (baseline)	`condition_a_baseline/`	54.9%	599
B (+Phase 1)	`condition_b_phase1/`	67.3%	599
C (+Phase 1+2)	`condition_c_phase1_2/`	67.3%	599
D (+Phase 1+3)	`condition_d_phase1_3/`	74.6%	599

Each condition contains summary.json, v3_lcb/results.json, and 599 per-task JSON files in v3_lcb/per_task/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ATLAS Repository Map

File Tree

Description Tables

Root — Configuration

Root — Documentation

atlas-proxy/ — Agent Loop (Go)

atlas/ — Python CLI

benchmark/ — Benchmark Infrastructure

benchmark/datasets/ — Dataset Loaders

benchmark/analysis/ — Analysis Utilities

benchmark/custom/ — Custom Tasks

benchmark/v3/ — V3 Pipeline Modules

geometric-lens/ — Core Service

geometric-lens/geometric_lens/ — Scoring Models

geometric-lens/indexer/ — RAG Indexing

geometric-lens/retriever/ — RAG Retrieval

geometric-lens/router/ — Confidence Router

geometric-lens/cache/ — Pattern Cache

v3-service/ — V3 Pipeline HTTP Wrapper

sandbox/ — Isolated Code Execution

inference/ — llama-server Configuration

scripts/ — Automation

tests/ — Test Suite

docs/ — Documentation

docs/reports/ — Studies, Status, Migration

v3_ablation_results/ — Published Evidence

FilesExpand file tree

MAP.md

Latest commit

History

MAP.md

File metadata and controls

ATLAS Repository Map

File Tree

Description Tables

Root — Configuration

Root — Documentation

atlas-proxy/ — Agent Loop (Go)

atlas/ — Python CLI

benchmark/ — Benchmark Infrastructure

benchmark/datasets/ — Dataset Loaders

benchmark/analysis/ — Analysis Utilities

benchmark/custom/ — Custom Tasks

benchmark/v3/ — V3 Pipeline Modules

geometric-lens/ — Core Service

geometric-lens/geometric_lens/ — Scoring Models

geometric-lens/indexer/ — RAG Indexing

geometric-lens/retriever/ — RAG Retrieval

geometric-lens/router/ — Confidence Router

geometric-lens/cache/ — Pattern Cache

v3-service/ — V3 Pipeline HTTP Wrapper

sandbox/ — Isolated Code Execution

inference/ — llama-server Configuration

scripts/ — Automation

tests/ — Test Suite

docs/ — Documentation

docs/reports/ — Studies, Status, Migration

v3_ablation_results/ — Published Evidence