Log claude model in backfill#134
Conversation
Dual-agent review —
|
| Source | Finding (severity, evidence) | Intersects |
|---|---|---|
| Codex | Contract & Interface Fidelity — backfilled user/request rows model-less vs live (major, conf medium; message_projector.js:646/655, claude/types.d.ts:38) | Risks → live-vs-backfill divergence; Cross-package usage (claude→ai-gateway contract). Two Claude reviewers independently classified it as pre-existing/by-design, not introduced by this PR. |
| Claude | Missing @ref LLP 0026 annotation (minor, conf 85; message_projector.js:652, transcripts.js:528, backfill.js:394) |
Targets (all touched symbols); LLP-governance surface, not a behavioral risk. |
| Claude | Per-message-WINS precedence untested (minor, conf 88; claude-backfill.test.js:368, message_projector.js:655) | Risks → precedence unverified (direct hit). |
Codex review
Fix Validations
Backfilled Claude assistant rows missing model
- Status: correct
- Evidence: hypaware-core/plugins-workspace/claude/src/transcripts.js:452, hypaware-core/plugins-workspace/claude/src/backfill.js:394, hypaware-core/plugins-workspace/ai-gateway/src/message_projector.js:652, test/plugins/claude-backfill.test.js:422
- Assessment: The model is read from
message.model, copied onto the projected message, and preferred during gateway row expansion. The regression test covers two assistant rows with different models.
Synthetic Claude assistant rows should not emit a fake model
- Status: correct
- Evidence: hypaware-core/plugins-workspace/claude/src/transcripts.js:535, test/plugins/claude-backfill.test.js:424
- Assessment:
<synthetic>is filtered before projection, so it cannot land in the gatewaymodelcolumn.
Findings
2) Contract & Interface Fidelity
- Severity: major
- Confidence: medium
- Evidence: hypaware-core/plugins-workspace/ai-gateway/src/message_projector.js:646, hypaware-core/plugins-workspace/ai-gateway/src/message_projector.js:655, hypaware-core/plugins-workspace/claude/src/types.d.ts:38, test/plugins/claude-backfill.test.js:426
- Why it matters: Live gateway expansion puts the exchange
modelon every message-part row, but this PR intentionally leaves backfilled user rows without a model, so backfilled prompt/user rows still diverge from live capture and remain unqueryable by model. - Suggested fix: Either propagate the associated assistant response model onto the corresponding backfilled user/request rows, or update the contract/commentary and tests to state that backfill model fidelity is assistant-output-only.
No Finding
- Behavioral Correctness
- Change Impact / Blast Radius
- Concurrency, Ordering & State Safety
- Error Handling & Resilience
- Security Surface
- Resource Lifecycle & Cleanup
- Release Safety
- Test Evidence Quality
- Architectural Consistency
- Debuggability & Operability
Evidence Bundle
- Changed hot paths: transcript JSONL parsing, Claude backfill projection, AI gateway message-part expansion, Claude backfill regression tests.
- Impacted callers: hypaware-core/plugins-workspace/claude/src/transcripts.js:416, hypaware-core/plugins-workspace/claude/src/backfill.js:306, hypaware-core/plugins-workspace/ai-gateway/src/message_projector.js:451.
- Impacted tests: test/plugins/claude-backfill.test.js:368.
- Unresolved uncertainty: I did not run the test suite; review is based on the supplied diff plus targeted worktree tracing.
Claude review
Claude review
Per-message-model decision lacks the mandated @ref LLP 0026 annotation
- Severity: minor
- Confidence: 85
- Evidence: hypaware-core/plugins-workspace/ai-gateway/src/message_projector.js:652, hypaware-core/plugins-workspace/claude/src/transcripts.js:528, hypaware-core/plugins-workspace/claude/src/backfill.js:394
- Why it matters: CLAUDE.md requires
// @ref LLP NNNN#anchoron code realizing a non-obvious documented decision; "per-message model wins over the exchange model" and "<synthetic>is a sentinel, drop it" are exactly that (LLP 0026#decision/#consequences), and every sibling field in these same functions already carries a ref (e.g. message_projector.js:580/663, transcripts.js:187/235), so the omission leaves a gap in the/ref-storyrationale view precisely where a governed decision was implemented. - Suggested fix: Add
// @ref LLP 0026#consequences [implements] — per-message model mirrors the transcript's per-line model; exchange model is the fallbackabove themodel:line in message_projector.js, and// @ref LLP 0026#decision — per-line model; <synthetic> is a local-only sentinel, not a model idabovetranscriptModelin transcripts.js. Both anchors already exist in llp/0026-claude-native-granularity.decision.md.
Per-message-WINS-over-exchange-model precedence is not actually tested
- Severity: minor
- Confidence: 88
- Evidence: test/plugins/claude-backfill.test.js:368, hypaware-core/plugins-workspace/ai-gateway/src/message_projector.js:655
- Why it matters: The new test only exercises the case where the exchange-level
modelisundefined(claude backfill never setsprojection.model), so it proves "per-message model surfaces" but not "per-message beats a different exchange model" — reversing the operands toctx.projection.model ?? stringValue(ctx.message.model)would leave every assertion green, so the most load-bearing claim in the diff (asserted in three comments and the.d.ts) is unverified. - Suggested fix: Add one assertion where
projection.modeldiffers from the per-message model — e.g. a directexpandMessageParts/aiGatewayRowsFromProjectedExchangeunit test withprojection.model = 'exchange-model'andmessage.model = 'msg-model'assertingrow.model === 'msg-model', plus an assistant row with no per-message model to lock the fallback direction.
Reports: /Users/phil/workspace/hypaware/.git/dual-review/pr-134
…ref Resolves the findings from the dual-agent review of this PR. No behavior change — the model expression is untouched; only docs, a test, and annotations are added. - Codex (contract fidelity): make the live-vs-backfill model granularity explicit. Live capture has one model per exchange and stamps it on every row; backfill surfaces the per-line message.model, which the transcript records on assistant lines only, so backfilled user/tool_result rows carry no model (assistant-output-only). Documented in the AiGatewayProjectedMessage and TranscriptEntry doc comments, a new LLP 0026 Consequences bullet, and the backfill test's user-row assertion. - Claude (untested precedence): add an ai-gateway projector test asserting the per-message model wins over a *different* exchange model and falls back to the exchange model when absent — reversing the ?? operands now fails a test. - Claude (missing annotation): @ref LLP 0026 on the transcript producer (transcriptModel, #decision) and the gateway model fallback (#consequences). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Addressed the dual-review findings in 57b0da1 (no behavior change — only docs, a test, and
All touched test files green (45/45). The decision-map was skipped on the prior run because of the major; that's now resolved. |
Our backfilled claude logs were missing the model. This fixes it.