ddv1982
diff --git a/‎CHANGELOG.md‎
Lines changed: 20 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎docs/releases/v2.0.9.md‎
Lines changed: 19 additions & 0 deletions b/‎docs/releases/v2.0.9.md‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎package.json‎
Lines changed: 1 addition & 1 deletion b/‎package.json‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/core/workflow/reducer.ts‎
Lines changed: 3 additions & 8 deletions b/‎src/core/workflow/reducer.ts‎
Lines changed: 3 additions & 8 deletions
diff --git a/‎src/prompt-system-context.ts‎
Lines changed: 32 additions & 1 deletion b/‎src/prompt-system-context.ts‎
Lines changed: 32 additions & 1 deletion
diff --git a/‎src/prompts/contracts.ts‎
Lines changed: 5 additions & 2 deletions b/‎src/prompts/contracts.ts‎
Lines changed: 5 additions & 2 deletions
diff --git a/‎src/prompts/fragments.ts‎
Lines changed: 7 additions & 1 deletion b/‎src/prompts/fragments.ts‎
Lines changed: 7 additions & 1 deletion
@@ -2,6 +2,26 @@
 
 ## [Unreleased]
 
+## [2.0.9] - 2026-05-06
+
+Refresh planning evidence packets without preserving stale context
+
+Flow 2.0.9 turns planning context evidence into an explicit durable packet ledger while keeping the workflow surface stable. Planning, execution, review, and final-review schemas can now carry source-backed evidence packets for selected context, exclusions, relationship hypotheses, ambiguity notes, covered findings, and validation evidence, and runtime planning context merges those packets through a shared domain helper instead of duplicating merge behavior across transitions.
+
+The release also closes the review risks found during hardening. Same-id evidence packets now refresh wholesale so replans can retract stale source refs or selected/excluded context instead of unioning obsolete evidence forever. Prompt guidance is split between runtime-owner and read-only roles, so planning researcher and reviewer prompts return evidence for a planner/coordinator/runtime owner to persist rather than telling read-only roles to call planning runtime tools. Tool schema budgets were tightened around the measured evidence-packet growth so future unrelated schema bloat still fails fast.
+
+Constraint: Add source-backed planning/review evidence packets without adding commands, tools, state paths, package exports, or dependency versions
+Constraint: Keep `zod` aligned with `@opencode-ai/plugin`; no dependency-version changes in this patch
+Rejected: Preserve same-id packet arrays by unioning old and new context | stale refs and selected/excluded context would survive replans and weaken evidence accuracy
+Rejected: Reuse one prompt fragment for runtime owners and read-only roles | it gives reviewers/researchers contradictory persistence instructions
+Rejected: Leave broad raw-schema ceilings after evidence-packet growth | oversized budgets hide unrelated future tool-schema drift
+Confidence: high
+Scope-risk: moderate
+Reversibility: clean
+Directive: Treat same-id evidence packets as refreshes, not append logs; use new packet ids for additive evidence and keep runtime-tool persistence instructions out of read-only prompt surfaces
+Tested: `bun run typecheck`; `bun run lint`; `bun test tests/runtime/evidence-packets.test.ts tests/config/tool-schemas.test.ts tests/config/prompt-contracts.test.ts tests/runtime-hooks.test.ts tests/runtime/workflow-core-reducer.test.ts` (54 pass, 950 expect calls); `bun run check`
+Not-tested: Live GitHub-hosted CI/release workflow runs for tag `v2.0.9` before push
+
 ## [2.0.8] - 2026-05-06
 
 Ground final review coverage in canonical evidence
 
@@ -0,0 +1,19 @@
+# v2.0.9
+
+Refresh planning evidence packets without preserving stale context
+
+Flow 2.0.9 turns planning context evidence into an explicit durable packet ledger while keeping the workflow surface stable. Planning, execution, review, and final-review schemas can now carry source-backed evidence packets for selected context, exclusions, relationship hypotheses, ambiguity notes, covered findings, and validation evidence, and runtime planning context merges those packets through a shared domain helper instead of duplicating merge behavior across transitions.
+
+The release also closes the review risks found during hardening. Same-id evidence packets now refresh wholesale so replans can retract stale source refs or selected/excluded context instead of unioning obsolete evidence forever. Prompt guidance is split between runtime-owner and read-only roles, so planning researcher and reviewer prompts return evidence for a planner/coordinator/runtime owner to persist rather than telling read-only roles to call planning runtime tools. Tool schema budgets were tightened around the measured evidence-packet growth so future unrelated schema bloat still fails fast.
+
+Constraint: Add source-backed planning/review evidence packets without adding commands, tools, state paths, package exports, or dependency versions
+Constraint: Keep `zod` aligned with `@opencode-ai/plugin`; no dependency-version changes in this patch
+Rejected: Preserve same-id packet arrays by unioning old and new context | stale refs and selected/excluded context would survive replans and weaken evidence accuracy
+Rejected: Reuse one prompt fragment for runtime owners and read-only roles | it gives reviewers/researchers contradictory persistence instructions
+Rejected: Leave broad raw-schema ceilings after evidence-packet growth | oversized budgets hide unrelated future tool-schema drift
+Confidence: high
+Scope-risk: moderate
+Reversibility: clean
+Directive: Treat same-id evidence packets as refreshes, not append logs; use new packet ids for additive evidence and keep runtime-tool persistence instructions out of read-only prompt surfaces
+Tested: `bun run typecheck`; `bun run lint`; `bun test tests/runtime/evidence-packets.test.ts tests/config/tool-schemas.test.ts tests/config/prompt-contracts.test.ts tests/runtime-hooks.test.ts tests/runtime/workflow-core-reducer.test.ts` (54 pass, 950 expect calls); `bun run check`
+Not-tested: Live GitHub-hosted CI/release workflow runs for tag `v2.0.9` before push
@@ -1,6 +1,6 @@
 {
   "name": "opencode-plugin-flow",
-  "version": "2.0.8",
+  "version": "2.0.9",
   "description": "Stateful planning and execution workflow plugin for OpenCode",
   "type": "module",
   "main": "dist/index.js",
 
@@ -1,5 +1,6 @@
 import type { Plan, Session } from "../../workflow/contracts";
 import { SessionSchema } from "../../workflow/contracts";
+import { mergePlanningContext } from "../../workflow/domain";
 import type { WorkflowEvent } from "./events";
 import { createInitialWorkflowState, type WorkflowState } from "./state";
 
@@ -58,10 +59,7 @@ function reducePlanApplied(
 		approval: "pending",
 		closure: null,
 		notes: [],
-		planning: {
-			...state.planning,
-			...event.planning,
-		},
+		planning: mergePlanningContext(state.planning, event.planning ?? {}),
 		execution: clearExecutionProjection(state),
 		timestamps: {
 			...state.timestamps,
@@ -169,10 +167,7 @@ export function applyWorkflowEvent(
 			const current = assertState(state, event);
 			return parseState({
 				...current,
-				planning: {
-					...current.planning,
-					...event.planning,
-				},
+				planning: mergePlanningContext(current.planning, event.planning),
 				timestamps: {
 					...current.timestamps,
 					updatedAt: event.recordedAt,
 
@@ -1,5 +1,9 @@
 import type { StackStandardsProfileCacheValue } from "./runtime/application/stack-standards-profile";
-import type { Session, StandardsProfile } from "./runtime/schema";
+import type {
+	EvidencePacket,
+	Session,
+	StandardsProfile,
+} from "./runtime/schema";
 import { deriveSessionViewModel } from "./runtime/summary";
 
 const FLOW_RUNTIME_CONTEXT_MARKER =
@@ -50,6 +54,24 @@ function compactStandardsGaps(
 	return gaps.length > 0 ? gaps.join("; ") : null;
 }
 
+function compactEvidencePackets(
+	packets: readonly EvidencePacket[],
+	limit = 3,
+): string | null {
+	if (packets.length === 0) {
+		return null;
+	}
+	const latest = packets
+		.slice(-limit)
+		.map((packet) => {
+			const purpose = packet.purpose ? `/${packet.purpose}` : "";
+			const lane = packet.contextLane ? `@${packet.contextLane}` : "";
+			return `${packet.id}${purpose}${lane}: ${compact(packet.summary, 120)}`;
+		})
+		.join("; ");
+	return `${packets.length} packet(s): ${latest}`;
+}
+
 export function buildFlowAdaptiveSystemContext(
 	session: Session | null,
 ): string[] {
@@ -126,6 +148,15 @@ export function buildFlowAdaptiveSystemContext(
 		}
 	}
 
+	if (viewModel.session.planning.evidencePackets?.length) {
+		const evidence = compactEvidencePackets(
+			viewModel.session.planning.evidencePackets,
+		);
+		if (evidence) {
+			lines.push(`- context evidence: ${quoted(evidence)}`);
+		}
+	}
+
 	if (viewModel.session.planning.standardsProfile) {
 		const standards = viewModel.session.planning.standardsProfile;
 		const localCount = standards.localGuidelines.length;
 
@@ -33,7 +33,7 @@ Record planning context separately via flow_plan_context_record or flow_plan_app
 - planning.research?: string[]
 - planning.implementationApproach?: { chosenDirection: string, keyConstraints: string[], validationSignals: string[], sources: string[] }
 - planning.decisionLog?: { question: string, decisionMode?: autonomous_choice | recommend_confirm | human_required, decisionDomain?: architecture | product | quality | scope | delivery, options: { label: string, tradeoffs: string[] }[], recommendation: string, rationale: string[] }[]
-- planning.evidencePackets?: { id: string, purpose?: planning | review | audit | validation | general, summary: string, sourceRefs?: string[], highlights?: string[], selectedContext?: string[], excludedContext?: string[], codemapSummaries?: string[], sliceSummaries?: string[], relationshipHypotheses?: string[], ambiguities?: string[], knownExclusions?: string[], alreadyCoveredFindings?: string[], validationEvidence?: { command, status, summary }[] }[]`;
+- planning.evidencePackets?: { id: string, purpose?: planning | review | audit | validation | general, contextLane?: planning | auto_planning | execution | review | status | history | session | reset | doctor | control, summary: string, sourceRefs?: string[], highlights?: string[], selectedContext?: string[], excludedContext?: string[], codemapSummaries?: string[], sliceSummaries?: string[], relationshipHypotheses?: string[], ambiguities?: string[], knownExclusions?: string[], alreadyCoveredFindings?: string[], validationEvidence?: { command, status, summary }[] }[]`;
 
 export const FLOW_PLAN_CONTRACT = `${FLOW_PLAN_CONTRACT_BASE}
 
@@ -66,6 +66,7 @@ const FLOW_WORKER_CONTRACT_BASE = `Return exactly one JSON object that matches t
 - validationRun: { command, status: passed | failed | failed_existing | partial, summary }[]
 - decisions: { summary }[]
 - reviewFindingClosures?: { findingRef, status: closed | partially_closed | not_closed | blocked, fixRefs: string[], testRefs: string[], validationRefs: string[], residualRisk }[]
+- evidencePackets?: immutable compact evidence/context packet reference[]
 - nextStep: string
 - reviewIterations?: number
 - validationScope?: targeted | broad
@@ -83,6 +84,7 @@ Status rules:
 - when the active feature is the final completion path for the session, run broad validation, include finalReview from the runtime-owned final review required by deliveryPolicy.finalReviewPolicy (detailed cross-feature by default), set finalReview.reviewDepth to match deliveryPolicy.finalReviewPolicy, and use validationScope: broad
 - finalReview must always include reviewedSurfaces, evidenceSummary, validationAssessment, and evidenceRefs describing what was checked
 - finalReview.evidenceRefs.changedArtifacts must reference actual artifactsChanged paths, and finalReview.evidenceRefs.validationCommands must reference actual validationRun commands from the current run
+- top-level evidencePackets are compact packet references for planning/execution context the worker reused or extended; they do not replace artifactsChanged, validationRun, featureReview, or finalReview evidenceRefs
 - finalReview.evidencePackets is optional read-only metadata for selected/excluded context, exact sources, relationship hypotheses, ambiguities, known exclusions, already-covered findings, and validation evidence; do not use it as a substitute for required finalReview.evidenceRefs
 - finalReview.reviewedSurfaces must cover the execution-derived required surfaces from the current run, including changed_files when artifactsChanged is non-empty, validation_evidence when validationRun is recorded, and any touched docs/prompt, tooling/config, operator, release, or test surfaces
 - when deliveryPolicy.finalReviewPolicy is detailed, include finalReview.integrationChecks and finalReview.regressionChecks, and make sure reviewedSurfaces covers validation_evidence plus at least one cross-feature surface
@@ -120,7 +122,7 @@ export const FLOW_REVIEWER_CONTRACT = `Return exactly one JSON object that match
 - evidenceSummary?: string
 - validationAssessment?: string
 - evidenceRefs?: { changedArtifacts: string[], validationCommands: string[] }
-- evidencePackets?: read-only evidence/context packet[]
+- evidencePackets?: read-only evidence/context packet references for feature reviews; full evidence/context packets for final reviews
 - reviewContextPack?: { task: string, compareBase?: string, changedFiles: string[], includedContext: { path: string, reason: changed_file | imported_dependency | caller | callee | state_owner | lifecycle_owner | architectural_neighbor | test_oracle | validation_evidence, surface?: changed_files | integration_points | shared_surfaces | validation_evidence | tests | operator_surfaces | docs_and_prompts | tooling_and_config | release_surface, summary?: string }[], relationships: { from: string, to: string, kind: string, summary: string }[], validationEvidence: { command: string, status?: string, summary?: string }[], suggestedValidation: string[], coverageGaps: string[], reviewedSurfaces: changed_files | integration_points | shared_surfaces | validation_evidence | tests | operator_surfaces | docs_and_prompts | tooling_and_config | release_surface [] }
 - integrationChecks?: string[]
 - regressionChecks?: string[]
@@ -147,6 +149,7 @@ Reviewer rules:
 - for scope: final, when reviewContextPack is present, keep it grounded: reviewContextPack.changedFiles should map to reviewed changed artifacts, reviewContextPack.includedContext should capture connected context (not duplicate changed files only), and reviewContextPack.reviewedSurfaces should match reviewedSurfaces
 - for scope: final, distinguish directly changed files from connected context in summary/integrationChecks/regressionChecks/remainingGaps, and use remainingGaps to report uncovered product paths, missing or weak test oracles, and validation limits
 - for scope: final, set evidenceRefs.changedArtifacts to actual changed artifact paths you reviewed and evidenceRefs.validationCommands to actual validation commands you relied on
+- feature-scope evidencePackets are compact packet references; final-scope evidencePackets may include full packet metadata, but neither replaces concrete changed path or validation evidence
 - for scope: final, use evidencePackets only as optional read-only context/evidence metadata; do not let packet references replace concrete evidenceRefs
 - for scope: final, cover the execution-derived required surfaces from the current run, including changed_files when artifactsChanged is non-empty, validation_evidence when validationRun is recorded, and any touched docs/prompt, tooling/config, operator, release, or test surfaces
 - for scope: final, when reviewDepth is detailed, include integrationChecks and regressionChecks, and cover validation_evidence plus at least one cross-feature surface
 
@@ -63,11 +63,17 @@ export const FLOW_RELEASE_HYGIENE_REVIEW_RULE =
 	"- Treat release hygiene as a review gate: do not approve work that leaves raw console calls, debugger statements, or undocumented debug-only instrumentation in release-bound source or build artifacts, do not approve changes that delete intentional operator/observability signals without evidence of an equivalent logger, telemetry, or stdout/stderr replacement preserving severity, message intent, and key context, and do not approve a new logging or telemetry dependency unless it was explicitly approved.";
 export const FLOW_REVIEW_CONTEXT_DISCOVERY_RULE =
 	"- Treat changed files as the review seed, not the boundary: include connected context discovered through callers/callees, state or lifecycle owners, architectural neighbors, tests, and validation evidence; distinguish directly changed files from connected context and report coverage gaps/validation limits explicitly.";
+export const FLOW_CONTEXT_GATHERING_RUNTIME_RULE =
+	"- Treat context gathering as a Flow-wide runtime contract: gather or reuse source-backed evidence before planning, execution, or review claims; persist durable repo evidence through planning.evidencePackets/sourceRefs when it should survive across commands; changed files are seeds, not boundaries; control/status surfaces only report existing context.";
+export const FLOW_CONTEXT_GATHERING_READONLY_RULE =
+	"- Treat context gathering as a read-only evidence contract: gather or reuse source-backed evidence before planning or review claims; return durable repo evidence as evidencePackets/sourceRefs for the planner, coordinator, or runtime owner to persist; changed files are seeds, not boundaries; control/status surfaces only report existing context.";
 
 export const FLOW_ADVERSARIAL_FAILURE_MODE_REVIEW_RULE =
 	"- Review changed behavior through applicable adversarial failure-mode classes before approving: lifecycle/reentrancy/idempotency, async races/event ordering, persistence failure and recovery, interaction geometry/hit-testing, accessibility semantics/live regions, and test-oracle authenticity. When a class is applicable, cite the concrete path checked in summary, integrationChecks, regressionChecks, blockingFindings, followUps, or suggestedValidation; when it is not applicable, do not force a finding.";
-export const FLOW_STACK_STANDARDS_PROFILE_RULE =
+export const FLOW_STACK_STANDARDS_PROFILE_RUNTIME_RULE =
 	"- Treat planning.stackProfile and planning.standardsProfile as the runtime-owned stack and standards profile: local repo guidance outranks official docs, official docs outrank broader Exa/websearch guidance, resolve planning.standardsProfile.gaps by using available MCP tools first (Ref MCP for official docs, Exa for current ecosystem best-practice synthesis), and websearch/webfetch only as fallback; record researched sources and resulting rules through flow_plan_context_record; keep external guidance bounded to the detected stack and never change package/dependency versions from standards research alone.";
+export const FLOW_STACK_STANDARDS_PROFILE_READONLY_RULE =
+	"- Treat planning.stackProfile and planning.standardsProfile as runtime-owned stack and standards profiles: local repo guidance outranks official docs, official docs outrank broader Exa/websearch guidance, resolve standards gaps with available source-backed research, return researched sources and resulting rules for the planner, coordinator, or runtime owner to persist, keep external guidance bounded to the detected stack, and never change package/dependency versions from standards research alone.";
 
 export const FLOW_PACKAGE_MANAGER_PRIMARY_CONTRACT_RULE =
 	"- Treat existing package.json scripts as the primary execution contract; invoke them through the detected package manager or the repo's established script-running convention. Package-manager detection is supporting evidence. Do not assume Bun unless repo evidence says Bun.";
Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`{`
`2`	`2`	`"name": "opencode-plugin-flow",`
`3`		`- "version": "2.0.8",`
	`3`	`+ "version": "2.0.9",`
`4`	`4`	`"description": "Stateful planning and execution workflow plugin for OpenCode",`
`5`	`5`	`"type": "module",`
`6`	`6`	`"main": "dist/index.js",`