Skip to content

Commit 16df877

Browse files
committed
Refresh planning evidence packets without preserving stale context
Flow now carries durable source-backed planning/review evidence packets while keeping the public workflow surface stable. Same-id packet updates refresh wholesale so replans can retract stale context, and prompt guidance separates runtime-owner persistence from read-only evidence gathering. Constraint: Add evidence-packet planning context without adding commands, tools, state paths, package exports, or dependency versions Constraint: Keep zod aligned with @opencode-ai/plugin; no dependency-version changes in this release Rejected: Union same-id evidence packet arrays | stale source refs and selected/excluded context would survive replans Rejected: Reuse one prompt fragment across runtime-owner and read-only roles | it gives read-only reviewers/researchers contradictory persistence instructions Rejected: Leave broad raw-schema ceilings after packet growth | oversized budgets hide unrelated future schema drift Confidence: high Scope-risk: moderate Directive: Treat same-id evidence packets as refreshes; use new ids for additive evidence and keep runtime-tool persistence instructions out of read-only prompt surfaces Tested: bun run check Not-tested: Live GitHub-hosted CI/release workflow run for tag v2.0.9 before push
1 parent 7e7d430 commit 16df877

24 files changed

Lines changed: 465 additions & 88 deletions

CHANGELOG.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,26 @@
22

33
## [Unreleased]
44

5+
## [2.0.9] - 2026-05-06
6+
7+
Refresh planning evidence packets without preserving stale context
8+
9+
Flow 2.0.9 turns planning context evidence into an explicit durable packet ledger while keeping the workflow surface stable. Planning, execution, review, and final-review schemas can now carry source-backed evidence packets for selected context, exclusions, relationship hypotheses, ambiguity notes, covered findings, and validation evidence, and runtime planning context merges those packets through a shared domain helper instead of duplicating merge behavior across transitions.
10+
11+
The release also closes the review risks found during hardening. Same-id evidence packets now refresh wholesale so replans can retract stale source refs or selected/excluded context instead of unioning obsolete evidence forever. Prompt guidance is split between runtime-owner and read-only roles, so planning researcher and reviewer prompts return evidence for a planner/coordinator/runtime owner to persist rather than telling read-only roles to call planning runtime tools. Tool schema budgets were tightened around the measured evidence-packet growth so future unrelated schema bloat still fails fast.
12+
13+
Constraint: Add source-backed planning/review evidence packets without adding commands, tools, state paths, package exports, or dependency versions
14+
Constraint: Keep `zod` aligned with `@opencode-ai/plugin`; no dependency-version changes in this patch
15+
Rejected: Preserve same-id packet arrays by unioning old and new context | stale refs and selected/excluded context would survive replans and weaken evidence accuracy
16+
Rejected: Reuse one prompt fragment for runtime owners and read-only roles | it gives reviewers/researchers contradictory persistence instructions
17+
Rejected: Leave broad raw-schema ceilings after evidence-packet growth | oversized budgets hide unrelated future tool-schema drift
18+
Confidence: high
19+
Scope-risk: moderate
20+
Reversibility: clean
21+
Directive: Treat same-id evidence packets as refreshes, not append logs; use new packet ids for additive evidence and keep runtime-tool persistence instructions out of read-only prompt surfaces
22+
Tested: `bun run typecheck`; `bun run lint`; `bun test tests/runtime/evidence-packets.test.ts tests/config/tool-schemas.test.ts tests/config/prompt-contracts.test.ts tests/runtime-hooks.test.ts tests/runtime/workflow-core-reducer.test.ts` (54 pass, 950 expect calls); `bun run check`
23+
Not-tested: Live GitHub-hosted CI/release workflow runs for tag `v2.0.9` before push
24+
525
## [2.0.8] - 2026-05-06
626

727
Ground final review coverage in canonical evidence

docs/releases/v2.0.9.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# v2.0.9
2+
3+
Refresh planning evidence packets without preserving stale context
4+
5+
Flow 2.0.9 turns planning context evidence into an explicit durable packet ledger while keeping the workflow surface stable. Planning, execution, review, and final-review schemas can now carry source-backed evidence packets for selected context, exclusions, relationship hypotheses, ambiguity notes, covered findings, and validation evidence, and runtime planning context merges those packets through a shared domain helper instead of duplicating merge behavior across transitions.
6+
7+
The release also closes the review risks found during hardening. Same-id evidence packets now refresh wholesale so replans can retract stale source refs or selected/excluded context instead of unioning obsolete evidence forever. Prompt guidance is split between runtime-owner and read-only roles, so planning researcher and reviewer prompts return evidence for a planner/coordinator/runtime owner to persist rather than telling read-only roles to call planning runtime tools. Tool schema budgets were tightened around the measured evidence-packet growth so future unrelated schema bloat still fails fast.
8+
9+
Constraint: Add source-backed planning/review evidence packets without adding commands, tools, state paths, package exports, or dependency versions
10+
Constraint: Keep `zod` aligned with `@opencode-ai/plugin`; no dependency-version changes in this patch
11+
Rejected: Preserve same-id packet arrays by unioning old and new context | stale refs and selected/excluded context would survive replans and weaken evidence accuracy
12+
Rejected: Reuse one prompt fragment for runtime owners and read-only roles | it gives reviewers/researchers contradictory persistence instructions
13+
Rejected: Leave broad raw-schema ceilings after evidence-packet growth | oversized budgets hide unrelated future tool-schema drift
14+
Confidence: high
15+
Scope-risk: moderate
16+
Reversibility: clean
17+
Directive: Treat same-id evidence packets as refreshes, not append logs; use new packet ids for additive evidence and keep runtime-tool persistence instructions out of read-only prompt surfaces
18+
Tested: `bun run typecheck`; `bun run lint`; `bun test tests/runtime/evidence-packets.test.ts tests/config/tool-schemas.test.ts tests/config/prompt-contracts.test.ts tests/runtime-hooks.test.ts tests/runtime/workflow-core-reducer.test.ts` (54 pass, 950 expect calls); `bun run check`
19+
Not-tested: Live GitHub-hosted CI/release workflow runs for tag `v2.0.9` before push

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "opencode-plugin-flow",
3-
"version": "2.0.8",
3+
"version": "2.0.9",
44
"description": "Stateful planning and execution workflow plugin for OpenCode",
55
"type": "module",
66
"main": "dist/index.js",

src/core/workflow/reducer.ts

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import type { Plan, Session } from "../../workflow/contracts";
22
import { SessionSchema } from "../../workflow/contracts";
3+
import { mergePlanningContext } from "../../workflow/domain";
34
import type { WorkflowEvent } from "./events";
45
import { createInitialWorkflowState, type WorkflowState } from "./state";
56

@@ -58,10 +59,7 @@ function reducePlanApplied(
5859
approval: "pending",
5960
closure: null,
6061
notes: [],
61-
planning: {
62-
...state.planning,
63-
...event.planning,
64-
},
62+
planning: mergePlanningContext(state.planning, event.planning ?? {}),
6563
execution: clearExecutionProjection(state),
6664
timestamps: {
6765
...state.timestamps,
@@ -169,10 +167,7 @@ export function applyWorkflowEvent(
169167
const current = assertState(state, event);
170168
return parseState({
171169
...current,
172-
planning: {
173-
...current.planning,
174-
...event.planning,
175-
},
170+
planning: mergePlanningContext(current.planning, event.planning),
176171
timestamps: {
177172
...current.timestamps,
178173
updatedAt: event.recordedAt,

src/prompt-system-context.ts

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
import type { StackStandardsProfileCacheValue } from "./runtime/application/stack-standards-profile";
2-
import type { Session, StandardsProfile } from "./runtime/schema";
2+
import type {
3+
EvidencePacket,
4+
Session,
5+
StandardsProfile,
6+
} from "./runtime/schema";
37
import { deriveSessionViewModel } from "./runtime/summary";
48

59
const FLOW_RUNTIME_CONTEXT_MARKER =
@@ -50,6 +54,24 @@ function compactStandardsGaps(
5054
return gaps.length > 0 ? gaps.join("; ") : null;
5155
}
5256

57+
function compactEvidencePackets(
58+
packets: readonly EvidencePacket[],
59+
limit = 3,
60+
): string | null {
61+
if (packets.length === 0) {
62+
return null;
63+
}
64+
const latest = packets
65+
.slice(-limit)
66+
.map((packet) => {
67+
const purpose = packet.purpose ? `/${packet.purpose}` : "";
68+
const lane = packet.contextLane ? `@${packet.contextLane}` : "";
69+
return `${packet.id}${purpose}${lane}: ${compact(packet.summary, 120)}`;
70+
})
71+
.join("; ");
72+
return `${packets.length} packet(s): ${latest}`;
73+
}
74+
5375
export function buildFlowAdaptiveSystemContext(
5476
session: Session | null,
5577
): string[] {
@@ -126,6 +148,15 @@ export function buildFlowAdaptiveSystemContext(
126148
}
127149
}
128150

151+
if (viewModel.session.planning.evidencePackets?.length) {
152+
const evidence = compactEvidencePackets(
153+
viewModel.session.planning.evidencePackets,
154+
);
155+
if (evidence) {
156+
lines.push(`- context evidence: ${quoted(evidence)}`);
157+
}
158+
}
159+
129160
if (viewModel.session.planning.standardsProfile) {
130161
const standards = viewModel.session.planning.standardsProfile;
131162
const localCount = standards.localGuidelines.length;

src/prompts/contracts.ts

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ Record planning context separately via flow_plan_context_record or flow_plan_app
3333
- planning.research?: string[]
3434
- planning.implementationApproach?: { chosenDirection: string, keyConstraints: string[], validationSignals: string[], sources: string[] }
3535
- planning.decisionLog?: { question: string, decisionMode?: autonomous_choice | recommend_confirm | human_required, decisionDomain?: architecture | product | quality | scope | delivery, options: { label: string, tradeoffs: string[] }[], recommendation: string, rationale: string[] }[]
36-
- planning.evidencePackets?: { id: string, purpose?: planning | review | audit | validation | general, summary: string, sourceRefs?: string[], highlights?: string[], selectedContext?: string[], excludedContext?: string[], codemapSummaries?: string[], sliceSummaries?: string[], relationshipHypotheses?: string[], ambiguities?: string[], knownExclusions?: string[], alreadyCoveredFindings?: string[], validationEvidence?: { command, status, summary }[] }[]`;
36+
- planning.evidencePackets?: { id: string, purpose?: planning | review | audit | validation | general, contextLane?: planning | auto_planning | execution | review | status | history | session | reset | doctor | control, summary: string, sourceRefs?: string[], highlights?: string[], selectedContext?: string[], excludedContext?: string[], codemapSummaries?: string[], sliceSummaries?: string[], relationshipHypotheses?: string[], ambiguities?: string[], knownExclusions?: string[], alreadyCoveredFindings?: string[], validationEvidence?: { command, status, summary }[] }[]`;
3737

3838
export const FLOW_PLAN_CONTRACT = `${FLOW_PLAN_CONTRACT_BASE}
3939
@@ -66,6 +66,7 @@ const FLOW_WORKER_CONTRACT_BASE = `Return exactly one JSON object that matches t
6666
- validationRun: { command, status: passed | failed | failed_existing | partial, summary }[]
6767
- decisions: { summary }[]
6868
- reviewFindingClosures?: { findingRef, status: closed | partially_closed | not_closed | blocked, fixRefs: string[], testRefs: string[], validationRefs: string[], residualRisk }[]
69+
- evidencePackets?: immutable compact evidence/context packet reference[]
6970
- nextStep: string
7071
- reviewIterations?: number
7172
- validationScope?: targeted | broad
@@ -83,6 +84,7 @@ Status rules:
8384
- when the active feature is the final completion path for the session, run broad validation, include finalReview from the runtime-owned final review required by deliveryPolicy.finalReviewPolicy (detailed cross-feature by default), set finalReview.reviewDepth to match deliveryPolicy.finalReviewPolicy, and use validationScope: broad
8485
- finalReview must always include reviewedSurfaces, evidenceSummary, validationAssessment, and evidenceRefs describing what was checked
8586
- finalReview.evidenceRefs.changedArtifacts must reference actual artifactsChanged paths, and finalReview.evidenceRefs.validationCommands must reference actual validationRun commands from the current run
87+
- top-level evidencePackets are compact packet references for planning/execution context the worker reused or extended; they do not replace artifactsChanged, validationRun, featureReview, or finalReview evidenceRefs
8688
- finalReview.evidencePackets is optional read-only metadata for selected/excluded context, exact sources, relationship hypotheses, ambiguities, known exclusions, already-covered findings, and validation evidence; do not use it as a substitute for required finalReview.evidenceRefs
8789
- finalReview.reviewedSurfaces must cover the execution-derived required surfaces from the current run, including changed_files when artifactsChanged is non-empty, validation_evidence when validationRun is recorded, and any touched docs/prompt, tooling/config, operator, release, or test surfaces
8890
- when deliveryPolicy.finalReviewPolicy is detailed, include finalReview.integrationChecks and finalReview.regressionChecks, and make sure reviewedSurfaces covers validation_evidence plus at least one cross-feature surface
@@ -120,7 +122,7 @@ export const FLOW_REVIEWER_CONTRACT = `Return exactly one JSON object that match
120122
- evidenceSummary?: string
121123
- validationAssessment?: string
122124
- evidenceRefs?: { changedArtifacts: string[], validationCommands: string[] }
123-
- evidencePackets?: read-only evidence/context packet[]
125+
- evidencePackets?: read-only evidence/context packet references for feature reviews; full evidence/context packets for final reviews
124126
- reviewContextPack?: { task: string, compareBase?: string, changedFiles: string[], includedContext: { path: string, reason: changed_file | imported_dependency | caller | callee | state_owner | lifecycle_owner | architectural_neighbor | test_oracle | validation_evidence, surface?: changed_files | integration_points | shared_surfaces | validation_evidence | tests | operator_surfaces | docs_and_prompts | tooling_and_config | release_surface, summary?: string }[], relationships: { from: string, to: string, kind: string, summary: string }[], validationEvidence: { command: string, status?: string, summary?: string }[], suggestedValidation: string[], coverageGaps: string[], reviewedSurfaces: changed_files | integration_points | shared_surfaces | validation_evidence | tests | operator_surfaces | docs_and_prompts | tooling_and_config | release_surface [] }
125127
- integrationChecks?: string[]
126128
- regressionChecks?: string[]
@@ -147,6 +149,7 @@ Reviewer rules:
147149
- for scope: final, when reviewContextPack is present, keep it grounded: reviewContextPack.changedFiles should map to reviewed changed artifacts, reviewContextPack.includedContext should capture connected context (not duplicate changed files only), and reviewContextPack.reviewedSurfaces should match reviewedSurfaces
148150
- for scope: final, distinguish directly changed files from connected context in summary/integrationChecks/regressionChecks/remainingGaps, and use remainingGaps to report uncovered product paths, missing or weak test oracles, and validation limits
149151
- for scope: final, set evidenceRefs.changedArtifacts to actual changed artifact paths you reviewed and evidenceRefs.validationCommands to actual validation commands you relied on
152+
- feature-scope evidencePackets are compact packet references; final-scope evidencePackets may include full packet metadata, but neither replaces concrete changed path or validation evidence
150153
- for scope: final, use evidencePackets only as optional read-only context/evidence metadata; do not let packet references replace concrete evidenceRefs
151154
- for scope: final, cover the execution-derived required surfaces from the current run, including changed_files when artifactsChanged is non-empty, validation_evidence when validationRun is recorded, and any touched docs/prompt, tooling/config, operator, release, or test surfaces
152155
- for scope: final, when reviewDepth is detailed, include integrationChecks and regressionChecks, and cover validation_evidence plus at least one cross-feature surface

src/prompts/fragments.ts

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,11 +63,17 @@ export const FLOW_RELEASE_HYGIENE_REVIEW_RULE =
6363
"- Treat release hygiene as a review gate: do not approve work that leaves raw console calls, debugger statements, or undocumented debug-only instrumentation in release-bound source or build artifacts, do not approve changes that delete intentional operator/observability signals without evidence of an equivalent logger, telemetry, or stdout/stderr replacement preserving severity, message intent, and key context, and do not approve a new logging or telemetry dependency unless it was explicitly approved.";
6464
export const FLOW_REVIEW_CONTEXT_DISCOVERY_RULE =
6565
"- Treat changed files as the review seed, not the boundary: include connected context discovered through callers/callees, state or lifecycle owners, architectural neighbors, tests, and validation evidence; distinguish directly changed files from connected context and report coverage gaps/validation limits explicitly.";
66+
export const FLOW_CONTEXT_GATHERING_RUNTIME_RULE =
67+
"- Treat context gathering as a Flow-wide runtime contract: gather or reuse source-backed evidence before planning, execution, or review claims; persist durable repo evidence through planning.evidencePackets/sourceRefs when it should survive across commands; changed files are seeds, not boundaries; control/status surfaces only report existing context.";
68+
export const FLOW_CONTEXT_GATHERING_READONLY_RULE =
69+
"- Treat context gathering as a read-only evidence contract: gather or reuse source-backed evidence before planning or review claims; return durable repo evidence as evidencePackets/sourceRefs for the planner, coordinator, or runtime owner to persist; changed files are seeds, not boundaries; control/status surfaces only report existing context.";
6670

6771
export const FLOW_ADVERSARIAL_FAILURE_MODE_REVIEW_RULE =
6872
"- Review changed behavior through applicable adversarial failure-mode classes before approving: lifecycle/reentrancy/idempotency, async races/event ordering, persistence failure and recovery, interaction geometry/hit-testing, accessibility semantics/live regions, and test-oracle authenticity. When a class is applicable, cite the concrete path checked in summary, integrationChecks, regressionChecks, blockingFindings, followUps, or suggestedValidation; when it is not applicable, do not force a finding.";
69-
export const FLOW_STACK_STANDARDS_PROFILE_RULE =
73+
export const FLOW_STACK_STANDARDS_PROFILE_RUNTIME_RULE =
7074
"- Treat planning.stackProfile and planning.standardsProfile as the runtime-owned stack and standards profile: local repo guidance outranks official docs, official docs outrank broader Exa/websearch guidance, resolve planning.standardsProfile.gaps by using available MCP tools first (Ref MCP for official docs, Exa for current ecosystem best-practice synthesis), and websearch/webfetch only as fallback; record researched sources and resulting rules through flow_plan_context_record; keep external guidance bounded to the detected stack and never change package/dependency versions from standards research alone.";
75+
export const FLOW_STACK_STANDARDS_PROFILE_READONLY_RULE =
76+
"- Treat planning.stackProfile and planning.standardsProfile as runtime-owned stack and standards profiles: local repo guidance outranks official docs, official docs outrank broader Exa/websearch guidance, resolve standards gaps with available source-backed research, return researched sources and resulting rules for the planner, coordinator, or runtime owner to persist, keep external guidance bounded to the detected stack, and never change package/dependency versions from standards research alone.";
7177

7278
export const FLOW_PACKAGE_MANAGER_PRIMARY_CONTRACT_RULE =
7379
"- Treat existing package.json scripts as the primary execution contract; invoke them through the detected package manager or the repo's established script-running convention. Package-manager detection is supporting evidence. Do not assume Bun unless repo evidence says Bun.";

0 commit comments

Comments
 (0)