Evo MVP

A session-based CLI for generating workflow candidates, auditing them, and driving a windtunnel + release-gate loop with an explicit Forge state machine.

This repo focuses on explicit state routing, shared blackboard (ForgeState), and windtunnel-driven gates (Oracle + regression) for iterative improvement.

Features

Forge state machine: Explicit Node A-G pipeline with traceable routing and persisted ForgeState.
Windtunnel contract: Standardized WindTunnelSpec and WindTunnelReport artifacts.
Release gates: Must-not-regress, SLO, and quality checks with failure bundles.
Patch verification loop: Any HARD gate failure triggers patch + re-test, even when --rounds 1.
Failure bundles: Reproducible context + failing tasks + pointers for regression.

Requirements

Python 3.11+
Node.js + npm (only needed if you want the GitHub MCP server)

Install

# Windows
python -m venv venv
.\venv\Scripts\activate
pip install -e .

Quick Start

1) Initialize a session

.\venv\Scripts\python.exe -m evo.main init-session --name demo

2) Run the Forge state machine

.\venv\Scripts\python.exe -m evo.main forge-run \
  --session sessions/demo \
  --rounds 1 \
  --population 3 \
  --suite rag_windtunnel_v1 \
  --replications 3 \
  --seed 42 \
  --perturb \
  --budget-sweep "tool_calls=2,4;tokens=80,160" \
  --perturb-sweep "miss_prob=0.0,0.3;noise_docs=0,2" \
  --patch-mode strategy \
  --reset

What you get:

sessions/<name>/forge_state.json (shared blackboard)
sessions/<name>/state_machine/trace.md (node routing trace)
windtunnel evidence + report under sessions/<name>/evidence/
release gate results under sessions/<name>/gates/
recommendation artifacts when gates pass

CLI Commands

Session basics

.\venv\Scripts\python.exe -m evo.main init-session --name demo
.\venv\Scripts\python.exe -m evo.main generate --session sessions/demo --n 3
.\venv\Scripts\python.exe -m evo.main audit --session sessions/demo
.\venv\Scripts\python.exe -m evo.main report --session sessions/demo

Forge state machine (recommended)

.\venv\Scripts\python.exe -m evo.main forge-run --session sessions/demo --rounds 2 --population 3

Legacy iterate (compatibility wrapper)

.\venv\Scripts\python.exe -m evo.main iterate --session sessions/demo --rounds 2 --population 3

Metrics and suite runs

.\venv\Scripts\python.exe -m evo.main run --session sessions/demo --suite rag_windtunnel_v1 --replications 3
.\venv\Scripts\python.exe -m evo.main metrics --session sessions/demo --suite rag_windtunnel_v1

Forge State Machine

Nodes follow the explicit A-G route:

A Ingest: Loads requirements into ForgeState.user_requirements
B Generate: Produces candidate workflows and sets current_workflow
C Static Audit: Audits + writes required_fixes (deduped by rule_id)
D Windtunnel: Runs suite for current_workflow, writes report + stats
E Synthesis: Converts failures into actionable WDR updates + gate decisions
F Revise/Patch: Applies targeted patches and prepares for re-audit
G Package: Produces final recommendation once gates pass

Routing:

C Reject -> F -> C (re-audit), with no-progress watchdog to regenerate
C Pass -> D -> E -> (F if HARD gate fail) -> C -> ...
Only gate pass triggers G Package

Trace file:

sessions/<name>/state_machine/trace.md

Windtunnel Contract

Schema:

evo/windtunnel/spec.py (WindTunnelSpec)
evo/windtunnel/report.py (WindTunnelReport)

Artifacts:

sessions/<name>/evidence/workflow/<candidate_id>/windtunnel/spec_<suite>.json
sessions/<name>/evidence/workflow/<candidate_id>/windtunnel/report_<suite>.json

Release Gates

Gate rules are configured in gate_rules.yaml:

Must-not-regress: loop rate / unauthorized tool calls / injection success
SLO: latency/cost/timeouts thresholds
Quality: pass mean >= baseline - epsilon

Gate outputs:

sessions/<name>/gates/last_result.json
sessions/<name>/gates/baseline.json

Failure Bundles (Replay)

When a HARD gate fails, a failure bundle is created:

sessions/<name>/failure_bundles/<candidate_id>/<timestamp>/
  context.json
  failing_tasks.json
  pointers.json

This enables targeted regression and replay against worst-case sweep points.

Project Structure

.
├── evo/                       # Core package
│   ├── forge/                 # ForgeState + explicit state machine
│   │   ├── state.py            # ForgeState schema (blackboard)
│   │   └── machine.py          # Node A-G routing + verification loop
│   ├── windtunnel/             # Spec/report contracts
│   │   ├── spec.py             # WindTunnelSpec (inputs)
│   │   └── report.py           # WindTunnelReport (outputs)
│   ├── oracle/                 # Windtunnel simulator + sweeps
│   │   └── rag_mini_runner.py  # Runs suite + aggregates + failure stats
│   ├── gates.py                # Release gate evaluation + baselines
│   ├── metrics.py              # Metrics extraction from evidence
│   ├── audit_engine.py         # Ruleset audit + Oracle gate checks
│   ├── patching.py             # Patch logic for workflows/proposals
│   ├── recommend.py            # Recommendation + gate-aware output
│   ├── generate.py             # Candidate generation
│   ├── iterate.py              # Legacy iterate wrapper
│   ├── core.py                 # Core API used by CLI
│   ├── main.py                 # Typer CLI entry
│   └── models.py               # Pydantic models (WorkflowIR, Metrics, etc.)
├── eval_suites/                # Windtunnel suite definitions (JSON)
├── sessions/                   # Per-run outputs (not committed)
├── gate_rules.yaml             # Release gate thresholds
├── ruleset.yaml                # Proposal ruleset (legacy)
├── ruleset_workflow.yaml       # Workflow ruleset (current)
└── README.md

Key runtime outputs (generated under sessions/<name>/):

sessions/<name>/
├── forge_state.json            # ForgeState blackboard snapshot
├── state_machine/trace.md      # Node routing trace
├── evidence/workflow/<id>/     # Runs, sweeps, windtunnel report/spec
├── gates/                      # Baseline + last gate result
├── failure_bundles/            # Replayable failure bundles
└── recommendation.*            # Final recommendation (when gates pass)

Notes

Use the venv Python for all CLI runs on this repo.
Session artifacts can be large; do not commit sessions/ or log files.
The LLM generator requires google-genai and a valid GEMINI_API_KEY/GOOGLE_API_KEY.

License

Internal MVP.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.vscode		.vscode
eval_suites		eval_suites
evo.egg-info		evo.egg-info
evo		evo
.gitignore		.gitignore
README.md		README.md
debug_error.log		debug_error.log
debug_trace.py		debug_trace.py
gate_rules.yaml		gate_rules.yaml
guideline.md		guideline.md
pyproject.toml		pyproject.toml
requirements.json		requirements.json
requirements.txt		requirements.txt
ruleset.yaml		ruleset.yaml
ruleset_workflow.yaml		ruleset_workflow.yaml
windtunnel.md		windtunnel.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evo MVP

Features

Requirements

Install

Quick Start

1) Initialize a session

2) Run the Forge state machine

CLI Commands

Session basics

Forge state machine (recommended)

Legacy iterate (compatibility wrapper)

Metrics and suite runs

Forge State Machine

Windtunnel Contract

Release Gates

Failure Bundles (Replay)

Project Structure

Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Evo MVP

Features

Requirements

Install

Quick Start

1) Initialize a session

2) Run the Forge state machine

CLI Commands

Session basics

Forge state machine (recommended)

Legacy iterate (compatibility wrapper)

Metrics and suite runs

Forge State Machine

Windtunnel Contract

Release Gates

Failure Bundles (Replay)

Project Structure

Notes

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages