Replace CLI orchestration loops with agent workflow prompts by alanzabihi · Pull Request #118 · superagent-ai/polyresearch

alanzabihi · 2026-04-22T12:14:21Z

Summary

Replace the multi-step state machines in contribute.rs (443 lines) and lead.rs (367 lines) with agent workflow prompts that call CLI subcommands as shell tools
Rewrite experiment.md from 9 lines to 43 lines with iteration support, harness-only metric rules, and explicit leave-changes-in-place instructions
Add contribute-workflow.md (75 lines) and lead-workflow.md (63 lines) as the agent outer loops
Add spawn_workflow_agent to agent.rs for spawning workflow agents
Net: 272 insertions, 627 deletions (-355 lines)

Architecture change

The CLI no longer contains orchestration loops. polyresearch contribute and polyresearch lead now:

Setup (clone, init, preflight)
Spawn a workflow agent with the appropriate prompt
The agent calls CLI subcommands (polyresearch claim, polyresearch submit, etc.) as tools

All 15+ CLI subcommands stay exactly as-is. The agent composes them into workflows adaptively.

Lead/contribute independence

Lead and contribute are now fully independent agent sessions:

Lead stays on the repo root (default branch). Does sync, decide, policy-check, generate.
Contributor creates worktrees for each thesis. Does claim, experiment, attempt, submit, release.
Each runs git operations sequentially. No concurrent async loops sharing git state.
Strict role separation enforced in prompts: contributor never runs lead commands, lead never runs contributor commands.

This eliminates the race condition class (bugs 11, 20, 28, 29, 35, 37, 38) by design.

Motivation

See agent-vs-cli-balance.md for the full analysis. In short: agents fail at precision, CLIs fail at adaptation. The CLI keeps the precision-critical protocol primitives. The agent handles the adaptive multi-step workflows where the combinatorial state space caused 42 bugs in the rigid Rust loops.

Test plan

175 unit tests pass (cargo test --lib)
Clean build, zero warnings
Manual test: polyresearch contribute on a real project
Manual test: polyresearch lead on a real project
Manual test: both running simultaneously from the same directory

Note

Medium Risk
Large behavioral refactor of the CLI’s core lead/contribute execution path plus new git/GitHub automation (sync retries, worktree management, staging/commit). Failures could block or mis-sequence protocol actions, so end-to-end manual runs on real repos are important.

Overview
Shifts polyresearch lead and polyresearch contribute from Rust orchestration loops to prompt-driven workflow agents. Adds new contribute-workflow.md and lead-workflow.md prompts (plus a rewritten experiment.md) and introduces agent::spawn_workflow_agent plus prompt helpers to dynamically inject --once, sleep, max-parallel, and capacity guidance.

Introduces new protocol primitives to support the agent-driven flow. Adds polyresearch resume to recreate/verify a thesis worktree, sync .polyresearch-node.toml, and rewrite .polyresearch/thesis.md; adds polyresearch commit to stage only editable-surface changes and block protected paths before committing; claim/batch-claim now gate on a new duties::claim_gate and seed worktrees with thesis context.

Hardens repo hygiene and coordination. Bootstrap now tracks untracked files created by the setup agent, normalizes PROGRAM/PREPARE line endings, ensures .gitignore ignores .polyresearch-node.toml, and can force-add agent-created helper files. Sync is rewritten to pull/rebase safely and retry pushes on non-fast-forward races, with logic to discard “sync-only” local commits when needed. Thesis generation rejects duplicate titles via normalization, prune can remove worktrees for resolved/rejected theses, submit refuses PRs with no diff vs default branch, and GitHub handling adds enable_issues plus improved rate-limit retry classification/backoff.

Adds a --once cycle guard. New cycle_guard enforces “exactly one thesis cycle” runs by preventing additional claims after a release/submit marks the guard done.

^{Reviewed by Cursor Bugbot for commit 305dc9e. Bugbot is set up for automated code reviews on this repo. Configure here.}

Folds all coordination logic into three high-level commands per the v2 spec: - `bootstrap <url>`: clone/fork, write templates, init node, spawn setup agent - `lead`: sync ledger, policy-check PRs, decide PRs, generate theses - `contribute [url]`: auto-submit, hardware-aware parallelism, claim/resume, dispatch workers New modules: agent.rs (agent runner + recovery), worker.rs (ThesisWorker lifecycle with setup/run/record/cleanup phases). Updated NodeConfig with [agent] section, ProtocolConfig with default_branch, main.rs with deferred setup for bootstrap/contribute. 159 tests passing (98 unit + 61 e2e).

The v2 CLI encodes the full coordination protocol as deterministic behavior in bootstrap, lead, and contribute. Agents no longer need the protocol spec or skill file.

…sha None handling Root cause 1: Recovery functions (recover_from_logs, run_harness_directly) returned ExperimentResult with fabricated observations. Now they return RecoveredMetric (raw data only) and the worker classifies using MetricDirection from WorkerContext. Log recovery without a baseline is conservatively classified as no_improvement. Root cause 2: contribute passed the deferred-setup placeholder AppContext to duties::check, which read default config values. Now contribute builds a local_ctx with the real ProtocolConfig and ProgramSpec after loading them. Standalone: env_sha comparison in both decide.rs and lead.rs used filter_map to skip None values, treating None and Some("x") as equal. Fixed to compare Option<String> directly so mixed environments trigger Disagreement. 166 tests passing (103 unit + 63 e2e).

…tional log recovery contribute <url>: after cloning, re-discover RepoRef and rebuild GitHubClient so API calls use the correct owner/name. Only done when a URL is provided; without a URL the existing ctx is already correct. commit_editable_surface: reset user-configured protected_globs from PROGRAM.md in addition to the four hardcoded runtime paths. Previously only .polyresearch/, .polyresearch-node.toml, PROGRAM.md, and PREPARE.md were reset, silently allowing commits to user-declared protected paths. recover_from_logs: sort log files by name and take the metric from the last file instead of always picking the max. This avoids encoding a directional assumption (max is wrong for lower_is_better projects). 169 tests passing (104 unit + 65 e2e).

…neys New test infrastructure: - ScenarioGitHub: stateful mock that mutates in response to API calls so multi-step flows see the effects of prior steps - mock_agent.sh: deterministic agent controlled by MOCK_AGENT_RESULT env var (improved, no_improvement, crashed, fail) 7 scenario tests covering complete user journeys: - scenario_bootstrap_fresh: templates + node config created with goal text - scenario_bootstrap_idempotent: existing PROGRAM.md preserved, missing sections appended - scenario_contribute_improved: claim + worker dispatch with mock agent - scenario_contribute_no_improvement: full flow with no_improvement result - scenario_contribute_agent_failure: agent exit 1 handled gracefully - scenario_lead_accept_pr: sync + decide accepted + merge + close thesis - scenario_lead_reject_non_improvement: decide non_improvement + close PR, thesis stays open Also fixes bootstrap clone_if_needed to not hard-fail on git fetch when no remote exists (best-effort sync). 176 tests passing (104 unit + 65 e2e + 7 scenarios).

…arallelism contract Root cause: lead.rs reimplemented compute_decision and post-decision actions from decide.rs. The two had already diverged (NonImprovement handling, helper function usage). Extracted execute_decision as a shared pub function in decide.rs, made decide_without_peer_review and decide_with_peer_review pub(crate). lead.rs now calls the shared functions instead of maintaining its own copy. decide.rs also uses execute_decision for its own run(). fork_and_clone: detect whether fork_owner matches the current gh user. Only pass --org when targeting an org account; personal forks use plain gh repo fork without --org. calculate_parallelism: removed .max(1) so the function returns 0 when available_work is 0, matching the documented contract. The caller already guards the zero case. 181 tests passing (105 unit + 65 e2e + 11 scenarios).

…uard, and deduplicate node init ExperimentResult.baseline is now Option<f64> so the log-recovery path no longer fabricates a 0.0 baseline that could mislead the decision system. find_harness returns a relative path resolved per work_dir, so the baseline measurement uses its own copy of the harness script instead of the candidate's. Lead loop uses a shared is_pr_decidable helper from decide.rs that includes the maintainer_rejected check. Node initialization extracted to a single ensure_node_config in commands/mod.rs.

…e variant Auto-submit in contribute.rs now logs warnings and skips submitted_any when push or PR creation fails instead of silently swallowing errors. commit_editable_surface uses git diff --cached --quiet to detect staged changes only, avoiding false positives from unstaged protected-file modifications. execute() returns WorkerOutcome::Failed for crashed and infra_failure results instead of misclassifying them as NoImprovement.

…f, and validate auto-submit branch decide_ready_prs loads the ledger once before the loop and skips the iteration if it is stale in zero-conf mode, matching the guard used by the decide CLI command. policy_check_open_prs skips PRs that have no thesis_number instead of silently falling through both action branches. Auto-submit validates the worktree branch starts with the expected thesis/{issue}- prefix before pushing.

…ueue depth Policy check and sync may close PRs or push commits, so re-derive repository state before decide_ready_prs to avoid acting on stale snapshots. Use saturating_sub for min_queue_depth arithmetic to eliminate any possibility of usize underflow.

CLI v0.5.0: bootstrap, lead, and contribute orchestration commands

Bump cli_version in PROGRAM.md to 0.5.0

Root README now walks through the actual usage flow: bootstrap a project (which creates the coordination files), run the lead, run contributors. Removes references to the deleted POLYRESEARCH.md and skill file. CLI README adds a quick-start section with the three high-level commands and organizes the command summary by role.

Update READMEs for v0.5.0 bootstrap/lead/contribute workflow

Update READMEs for v0.5.0

Tighten README for first-time visitors

…m the repo URL Both commands were cloning directly into cwd when run outside a git repo, which fails if the directory already exists. Now they behave like git clone and create a child directory named after the repository.

Fix bootstrap and contribute to clone into repo-named subdirectory

- Add --capacity, --api-budget, --request-delay, --agent-command flags to contribute, lead, init, and bootstrap via shared NodeOverrides struct - For contribute/lead these are pure runtime overrides (no file writes) - For init/bootstrap they write initial values to .polyresearch-node.toml - Bootstrap now auto-forks when the user lacks push access to the target repo; add --no-fork to skip the check and clone directly - Add RepoRef::parse_url for URL-based owner/repo extraction - Update both READMEs with new flag documentation Closes #65

- Deduplicate URL stripping between parse_remote and parse_url into a shared strip_github_url function - Add conflicts_with = "fork" to --no-fork so clap rejects contradictory flags at parse time

Move trim_end_matches('/') to the end of the strip_github_url chain so it runs after prefix stripping, not before. The previous ordering ate the slash that is part of the https://github.com/ prefix pattern, causing the prefix strip to fail on inputs like https://github.com/owner/repo/.

Replace strip_github_url (trim_start_matches, returns &str) with strip_github_prefix (strip_prefix, returns Option). The old helper silently passed through non-GitHub URLs; the new one returns None when no known prefix matches. parse_url now also rejects URLs with extra path segments (e.g. /tree/main) and owner-only URLs.

Add CLI flags for node config overrides and auto-fork in bootstrap

Simplify README usage examples to one command per section

Move all agent prompts and document templates into separate .md files under cli/prompts/, embedded at compile time via include_str!. Enrich prompt content with autoresearch quality principles: simplicity criterion, history awareness, trust boundary framing, crash judgment.

…claim a thesis under a different name.

…s with tests.

…p getting stuck.

…through Fix lead policy-pass without decide follow-through

…solved Add regression coverage for resolved thesis claims

Fix secondary rate limit misclassification and shorten retry backoff

… conflicts.

…s stable.

…ets a focused retry and a real failure signal.

Fix lead queue refill when generation is skipped

…exhausted work is not proposed again.

… scenario test conflicts.

…tent and resume prep cannot drift.

cursor · 2026-04-23T10:51:34Z

+                        eprintln!("Warning: could not verify queue depth after agent run: {err}");
+                    }
+                }
+                break;


Lead and contribute loops exit after agent success

High Severity

In continuous mode (once=false), both the lead and contribute outer loops break when the workflow agent exits successfully. The agent prompt says "LOOP FOREVER," but agents will inevitably exit with code 0 due to context window limits. When that happens, the process silently terminates instead of restarting the agent for the next iteration. In contribute.rs, Ok(()) => break exits immediately; in lead.rs, the break at the end of the Ok arm has the same effect once post-checks pass. The Err path correctly restarts, but the Ok path does not, making continuous operation impossible once the agent's context window is exhausted.

Additional Locations (1)

cli/src/commands/contribute.rs#L86-L87

^{Reviewed by Cursor Bugbot for commit 75e7ab7. Configure here.}

Reject duplicate thesis titles in polyresearch generate

… refreshed scenario test conflicts.

…node Fix duplicate claim with inconsistent node name (#156)

…freshed PR conflicts.

…agents cannot claim a second thesis, and cover the guard with unit, e2e, and scenario tests.

…w task itself aborts before returning.

…the once-cycle guard branch rebases cleanly.

Fix contribute --once after the first thesis cycle

cursor · 2026-04-23T11:08:25Z

+    ".polyresearch-node.toml",
+    "PROGRAM.md",
+    "PREPARE.md",
+];


Commit command missing results.tsv in protected files

Medium Severity

The ALWAYS_PROTECTED list in the new commit.rs omits results.tsv, a critical protocol file that tracks the experiment ledger. If an agent accidentally modifies results.tsv in a worktree, polyresearch commit would include the change. Whether this is caught depends entirely on the PROGRAM.md "cannot_modify" globs being set up correctly — a fragile assumption given that bootstrap is agent-driven.

^{Reviewed by Cursor Bugbot for commit 6fd5553. Configure here.}

… by unrelated orchestration work.

Fix contributor resume flow for stale claims

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 7 total unresolved issues (including 6 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 305dc9e. Configure here.}

cursor · 2026-04-23T11:32:58Z

+            &thesis.issue.title,
+            thesis.issue.body.as_deref().unwrap_or(""),
+            &prior_attempts,
+        )?;


Claim missing node config sync to worktree

High Severity

claim creates the thesis worktree and writes thesis context, but unlike resume (which calls sync_node_config_to_worktree), it never copies .polyresearch-node.toml into the new worktree. When the workflow agent later CDs into the worktree and runs polyresearch commit, polyresearch attempt, or polyresearch submit, those commands call read_node_id(&ctx.repo_root) which looks for the config file in the worktree directory — and fails because it doesn't exist.

Additional Locations (1)

cli/src/commands/resume.rs#L52-L54

^{Reviewed by Cursor Bugbot for commit 305dc9e. Configure here.}

alanzabihi added 30 commits April 20, 2026 12:47

Remove POLYRESEARCH.md and SKILL.md replaced by CLI orchestration

faea286

The v2 CLI encodes the full coordination protocol as deterministic behavior in bootstrap, lead, and contribute. Agents no longer need the protocol spec or skill file.

Merge pull request #60 from superagent-ai/cli-v2-rebuild

04dc9b8

CLI v0.5.0: bootstrap, lead, and contribute orchestration commands

Bump cli_version in PROGRAM.md to 0.5.0 to match Cargo.toml

4963412

Merge pull request #61 from superagent-ai/fix-smoke-version

5635b22

Bump cli_version in PROGRAM.md to 0.5.0

Merge pull request #62 from superagent-ai/cli-v2-rebuild

504f43e

Update READMEs for v0.5.0 bootstrap/lead/contribute workflow

Remove fictional SSH relay pattern from remote machine section

94aed31

Merge pull request #63 from superagent-ai/cli-v2-rebuild

9b8f474

Update READMEs for v0.5.0

Tighten README for first-time visitors

6101767

Merge pull request #64 from superagent-ai/readme-cleanup

ed2a218

Tighten README for first-time visitors

Handle trailing-slash URLs in repo_name_from_url

e505879

Merge pull request #66 from superagent-ai/bootstrap-clone-subdir

63c02f8

Fix bootstrap and contribute to clone into repo-named subdirectory

Extract shared strip_github_url helper, add conflicts_with for --no-fork

46ade57

- Deduplicate URL stripping between parse_remote and parse_url into a shared strip_github_url function - Add conflicts_with = "fork" to --no-fork so clap rejects contradictory flags at parse time

Merge pull request #67 from superagent-ai/cli-overrides-auto-fork

f029480

Add CLI flags for node config overrides and auto-fork in bootstrap

Simplify README usage examples to one command per section

2faefc0

Merge pull request #68 from superagent-ai/simplify-readme-examples

e3802bc

Simplify README usage examples to one command per section

alanzabihi added 16 commits April 23, 2026 12:06

Keep auto-created node ids aligned with init so contribute cannot re-…

3b2a664

…claim a thesis under a different name.

Follow lead policy-pass with a decide sweep.

92c8358

Add regression tests for resolved thesis claims.

3fe0073

Fix GitHub secondary rate limit retries and cover the pace quota path…

bb8d322

…s with tests.

Resume stale contributor claims before new work so claimed theses sto…

7ac9a69

…p getting stuck.

Merge pull request #167 from superagent-ai/fix-153-lead-decide-follow…

b05d828

…through Fix lead policy-pass without decide follow-through

Merge pull request #168 from superagent-ai/fix-139-stale-claims-on-re…

a412a20

…solved Add regression coverage for resolved thesis claims

Merge pull request #170 from superagent-ai/fix-155-secondary-rate-limit

549c1df

Fix secondary rate limit misclassification and shorten retry backoff

Merge hybrid-agent-cli into issue-154-contribute-resume to resolve PR…

5093445

… conflicts.

Trim HOSTNAME when auto-creating node ids so contribute identity stay…

2c2f57e

…s stable.

Retry lead queue refill before exiting --once so skipped generation g…

2ed1d87

…ets a focused retry and a real failure signal.

Merge pull request #171 from superagent-ai/issue-125-lead-queue-refill

75e7ab7

Fix lead queue refill when generation is skipped

Reject duplicate thesis titles during lead generation so accepted or …

5fadd59

…exhausted work is not proposed again.

Treat punctuation as separators when deduplicating thesis titles.

a020d7c

Merge hybrid-agent-cli into issue-156-duplicate-claim-node to resolve…

b0dff3b

… scenario test conflicts.

Unify duty context and worktree setup so lead claim gates stay consis…

b4e8d20

…tent and resume prep cannot drift.

cursor Bot reviewed Apr 23, 2026

View reviewed changes

alanzabihi added 8 commits April 23, 2026 12:55

Merge pull request #165 from superagent-ai/issue-138-duplicate-thesis

f687db1

Reject duplicate thesis titles in polyresearch generate

Merge hybrid-agent-cli into issue-156-duplicate-claim-node to resolve…

e04d244

… refreshed scenario test conflicts.

Merge pull request #166 from superagent-ai/issue-156-duplicate-claim-…

8b52294

…node Fix duplicate claim with inconsistent node name (#156)

Merge hybrid-agent-cli into issue-154-contribute-resume to resolve re…

52a5e59

…freshed PR conflicts.

Enforce the contribute --once thesis-cycle limit in Rust so workflow …

1dcaa09

…agents cannot claim a second thesis, and cover the guard with unit, e2e, and scenario tests.

Keep the contribute --once guard cleanup path intact when the workflo…

57299e7

…w task itself aborts before returning.

Resolve the lead workflow merge conflict against hybrid-agent-cli so …

f16521e

…the once-cycle guard branch rebases cleanly.

Merge pull request #172 from superagent-ai/fix-once-cycle-guard

6fd5553

Fix contribute --once after the first thesis cycle

cursor Bot reviewed Apr 23, 2026

View reviewed changes

alanzabihi added 2 commits April 23, 2026 13:17

Separate claim gating from full duties so lead claims are not blocked…

d6ea711

… by unrelated orchestration work.

Merge pull request #169 from superagent-ai/issue-154-contribute-resume

305dc9e

Fix contributor resume flow for stale claims

cursor Bot reviewed Apr 23, 2026

View reviewed changes

alanzabihi force-pushed the main branch from 0d77b1b to 6916711 Compare April 23, 2026 12:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace CLI orchestration loops with agent workflow prompts#118

Replace CLI orchestration loops with agent workflow prompts#118
alanzabihi wants to merge 202 commits intomainfrom
hybrid-agent-cli

alanzabihi commented Apr 22, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot Apr 23, 2026

Uh oh!

Uh oh!

cursor Bot Apr 23, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alanzabihi commented Apr 22, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture change

Lead/contribute independence

Motivation

Test plan

Uh oh!

cursor Bot Apr 23, 2026

Choose a reason for hiding this comment

Lead and contribute loops exit after agent success

Uh oh!

Uh oh!

cursor Bot Apr 23, 2026

Choose a reason for hiding this comment

Commit command missing results.tsv in protected files

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 23, 2026

Choose a reason for hiding this comment

Claim missing node config sync to worktree

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alanzabihi commented Apr 22, 2026 •

edited by cursor Bot

Loading

Commit command missing `results.tsv` in protected files