Skip to content

Commit 75bc298

Browse files
authored
Merge pull request #366 from algorithmicsuperintelligence/feat-add-rich-feedback-example
Add rich feedback mode to k_module_problem example
2 parents fda1963 + 93b798e commit 75bc298

File tree

3 files changed

+75
-3
lines changed

3 files changed

+75
-3
lines changed

examples/k_module_problem/README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,25 @@ This establishes the "no learning" baseline. Any method that beats this is demon
166166

167167
**Key insight**: While OpenEvolve takes more iterations on average (52.3 vs 13), it has a **100% success rate** compared to iterative refinement's 33%. The evolutionary approach's population diversity ensures it eventually escapes local optima that trap single-trajectory methods.
168168

169+
### Rich Feedback Mode: Proving Attribution Matters
170+
171+
To verify that feedback attribution is the key factor, we added a `RICH_FEEDBACK=1` mode that tells the agent exactly which modules are correct/incorrect:
172+
173+
```bash
174+
RICH_FEEDBACK=1 python run_iterative_trials.py --trials 3 --iterations 100
175+
```
176+
177+
| Method | Success Rate | Avg Iterations |
178+
|--------|-------------|----------------|
179+
| **Iterative (no feedback)** | 33% | 13 (when found) |
180+
| **Iterative (rich feedback)** | **100%** | **3** |
181+
182+
With rich feedback, iterative refinement achieves **100% success rate in only 3 iterations** - dramatically faster than OpenEvolve's 52 iterations! This proves that:
183+
184+
1. **Feedback attribution is the key factor**, not the optimization method
185+
2. When feedback is attributable, iterative refinement is highly effective
186+
3. Evolution is necessary when feedback is NOT attributable (you can't tell which component is wrong)
187+
169188
## Why This Matters
170189

171190
This example illustrates when you should prefer evolutionary approaches:

examples/k_module_problem/evaluator.py

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,21 @@
99
This creates a challenging landscape for iterative refinement but
1010
allows evolutionary crossover to combine good "building blocks"
1111
from different individuals.
12+
13+
Set RICH_FEEDBACK=1 to enable rich feedback mode, which tells you
14+
exactly which modules are correct/incorrect. This demonstrates that
15+
iterative refinement works well when feedback is attributable.
1216
"""
1317

18+
import os
1419
import sys
1520
import time
1621
import traceback
1722
import importlib.util
1823

24+
# Rich feedback mode - when enabled, reveals which modules are correct
25+
RICH_FEEDBACK = os.environ.get("RICH_FEEDBACK", "0") == "1"
26+
1927
# The correct solution (hidden from the optimizer)
2028
# This represents the "optimal" pipeline configuration discovered through
2129
# extensive testing/domain expertise
@@ -141,14 +149,34 @@ def score_config(config: dict) -> tuple:
141149

142150
def build_artifacts(config: dict, correct_count: int, module_results: dict, eval_time: float) -> dict:
143151
"""
144-
Build artifacts that provide useful feedback without revealing
145-
exactly which modules are correct.
152+
Build artifacts that provide useful feedback.
153+
154+
In normal mode: Only reveals how many modules are correct, not which ones.
155+
In rich feedback mode (RICH_FEEDBACK=1): Reveals exactly which modules are correct/incorrect.
146156
"""
147157
artifacts = {}
148158

149159
# Configuration summary
150160
artifacts["configuration"] = str(config)
151161

162+
# Rich feedback mode - reveals which modules are correct/incorrect
163+
if RICH_FEEDBACK:
164+
correct_modules = [m for m, is_correct in module_results.items() if is_correct]
165+
incorrect_modules = [m for m, is_correct in module_results.items() if not is_correct]
166+
167+
artifacts["module_feedback"] = {
168+
"correct": correct_modules,
169+
"incorrect": incorrect_modules,
170+
}
171+
172+
if incorrect_modules:
173+
hints = []
174+
for module in incorrect_modules:
175+
hints.append(f"'{module}' is WRONG - try a different option from {VALID_OPTIONS[module]}")
176+
artifacts["actionable_hints"] = hints
177+
else:
178+
artifacts["actionable_hints"] = ["All modules are correct!"]
179+
152180
# Score feedback - tells you how many are correct, but not which ones
153181
if correct_count == NUM_MODULES:
154182
artifacts["status"] = "PERFECT! All modules correctly configured!"

examples/k_module_problem/iterative_agent.py

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,26 @@ def write_program(program_path: str, code: str) -> None:
6464
f.write(code)
6565

6666

67+
def format_rich_feedback(artifacts: dict) -> str:
68+
"""Format rich feedback if available (RICH_FEEDBACK=1)."""
69+
if "module_feedback" not in artifacts:
70+
return ""
71+
72+
feedback = artifacts["module_feedback"]
73+
hints = artifacts.get("actionable_hints", [])
74+
75+
result = "\n## DETAILED MODULE FEEDBACK (Rich Feedback Mode)\n"
76+
result += f"- CORRECT modules: {feedback.get('correct', [])}\n"
77+
result += f"- INCORRECT modules: {feedback.get('incorrect', [])}\n"
78+
79+
if hints:
80+
result += "\n### Actionable Hints:\n"
81+
for hint in hints:
82+
result += f"- {hint}\n"
83+
84+
return result
85+
86+
6787
def create_improvement_prompt(
6888
current_code: str,
6989
metrics: dict,
@@ -108,6 +128,7 @@ def create_improvement_prompt(
108128
- Score: {metrics.get('combined_score', 0):.2%}
109129
- Status: {artifacts.get('status', 'N/A')}
110130
- Suggestion: {artifacts.get('suggestion', 'N/A')}
131+
{format_rich_feedback(artifacts)}
111132
{history_str}
112133
113134
## Your Task
@@ -205,7 +226,11 @@ def run_iterative_refinement(
205226

206227
# Evaluate current program
207228
eval_result = evaluate(str(current_program_path))
208-
metrics = eval_result.get("metrics", {})
229+
# Handle both flat (success) and nested (error) return formats
230+
if "metrics" in eval_result:
231+
metrics = eval_result["metrics"]
232+
else:
233+
metrics = {k: v for k, v in eval_result.items() if k != "artifacts"}
209234
artifacts = eval_result.get("artifacts", {})
210235

211236
score = metrics.get("combined_score", 0)

0 commit comments

Comments
 (0)