Log claude model in backfill by bgmcmullen · Pull Request #134 · hyparam/hypaware

bgmcmullen · 2026-06-23T19:26:46Z

Our backfilled claude logs were missing the model. This fixes it.

philcunliffe · 2026-06-23T22:48:53Z

Dual-agent review — `request_changes`

Verdict: request_changes
Risk class: medium
Auto-merge advisory: 👎 thumbs down — verdict is request_changes; needs human-gated follow-up

Advisory only: no merge was attempted. Reviewed in an isolated worktree at the PR head; your working tree was untouched.

Note on the verdict: the single major is Codex's contract-fidelity finding (backfilled user/request rows stay model-less while live capture stamps the exchange model on them). Both independent Claude reviewers traced this as a pre-existing divergence (backfill never carried an exchange-level model) that this PR does not introduce or worsen — the change strictly improves assistant-row accuracy. The mechanical rubric still counts a category 1–8 major as request_changes; treat it as "confirm the user-row policy is intended," not "the code is wrong." Both remaining Claude findings are minor.

Risk capstone

Cross-reference: reviewer findings vs high-risk surfaces

Source	Finding (severity, evidence)	Intersects
Codex	Contract & Interface Fidelity — backfilled user/request rows model-less vs live (major, conf medium; message_projector.js:646/655, claude/types.d.ts:38)	Risks → live-vs-backfill divergence; Cross-package usage (claude→ai-gateway contract). Two Claude reviewers independently classified it as pre-existing/by-design, not introduced by this PR.
Claude	Missing `@ref LLP 0026` annotation (minor, conf 85; message_projector.js:652, transcripts.js:528, backfill.js:394)	Targets (all touched symbols); LLP-governance surface, not a behavioral risk.
Claude	Per-message-WINS precedence untested (minor, conf 88; claude-backfill.test.js:368, message_projector.js:655)	Risks → precedence unverified (direct hit).

Codex review

Fix Validations

Backfilled Claude assistant rows missing model

Status: correct
Evidence: hypaware-core/plugins-workspace/claude/src/transcripts.js:452, hypaware-core/plugins-workspace/claude/src/backfill.js:394, hypaware-core/plugins-workspace/ai-gateway/src/message_projector.js:652, test/plugins/claude-backfill.test.js:422
Assessment: The model is read from message.model, copied onto the projected message, and preferred during gateway row expansion. The regression test covers two assistant rows with different models.

Synthetic Claude assistant rows should not emit a fake model

Status: correct
Evidence: hypaware-core/plugins-workspace/claude/src/transcripts.js:535, test/plugins/claude-backfill.test.js:424
Assessment: <synthetic> is filtered before projection, so it cannot land in the gateway model column.

Findings

2) Contract & Interface Fidelity

Severity: major
Confidence: medium
Evidence: hypaware-core/plugins-workspace/ai-gateway/src/message_projector.js:646, hypaware-core/plugins-workspace/ai-gateway/src/message_projector.js:655, hypaware-core/plugins-workspace/claude/src/types.d.ts:38, test/plugins/claude-backfill.test.js:426
Why it matters: Live gateway expansion puts the exchange model on every message-part row, but this PR intentionally leaves backfilled user rows without a model, so backfilled prompt/user rows still diverge from live capture and remain unqueryable by model.
Suggested fix: Either propagate the associated assistant response model onto the corresponding backfilled user/request rows, or update the contract/commentary and tests to state that backfill model fidelity is assistant-output-only.

No Finding

Behavioral Correctness
Change Impact / Blast Radius
Concurrency, Ordering & State Safety
Error Handling & Resilience
Security Surface
Resource Lifecycle & Cleanup
Release Safety
Test Evidence Quality
Architectural Consistency
Debuggability & Operability

Evidence Bundle

Changed hot paths: transcript JSONL parsing, Claude backfill projection, AI gateway message-part expansion, Claude backfill regression tests.
Impacted callers: hypaware-core/plugins-workspace/claude/src/transcripts.js:416, hypaware-core/plugins-workspace/claude/src/backfill.js:306, hypaware-core/plugins-workspace/ai-gateway/src/message_projector.js:451.
Impacted tests: test/plugins/claude-backfill.test.js:368.
Unresolved uncertainty: I did not run the test suite; review is based on the supplied diff plus targeted worktree tracing.

Claude review

Claude review

Per-message-model decision lacks the mandated `@ref LLP 0026` annotation

Severity: minor
Confidence: 85
Evidence: hypaware-core/plugins-workspace/ai-gateway/src/message_projector.js:652, hypaware-core/plugins-workspace/claude/src/transcripts.js:528, hypaware-core/plugins-workspace/claude/src/backfill.js:394
Why it matters: CLAUDE.md requires // @ref LLP NNNN#anchor on code realizing a non-obvious documented decision; "per-message model wins over the exchange model" and "<synthetic> is a sentinel, drop it" are exactly that (LLP 0026 #decision/#consequences), and every sibling field in these same functions already carries a ref (e.g. message_projector.js:580/663, transcripts.js:187/235), so the omission leaves a gap in the /ref-story rationale view precisely where a governed decision was implemented.
Suggested fix: Add // @ref LLP 0026#consequences [implements] — per-message model mirrors the transcript's per-line model; exchange model is the fallback above the model: line in message_projector.js, and // @ref LLP 0026#decision — per-line model; <synthetic> is a local-only sentinel, not a model id above transcriptModel in transcripts.js. Both anchors already exist in llp/0026-claude-native-granularity.decision.md.

Per-message-WINS-over-exchange-model precedence is not actually tested

Severity: minor
Confidence: 88
Evidence: test/plugins/claude-backfill.test.js:368, hypaware-core/plugins-workspace/ai-gateway/src/message_projector.js:655
Why it matters: The new test only exercises the case where the exchange-level model is undefined (claude backfill never sets projection.model), so it proves "per-message model surfaces" but not "per-message beats a different exchange model" — reversing the operands to ctx.projection.model ?? stringValue(ctx.message.model) would leave every assertion green, so the most load-bearing claim in the diff (asserted in three comments and the .d.ts) is unverified.
Suggested fix: Add one assertion where projection.model differs from the per-message model — e.g. a direct expandMessageParts/aiGatewayRowsFromProjectedExchange unit test with projection.model = 'exchange-model' and message.model = 'msg-model' asserting row.model === 'msg-model', plus an assistant row with no per-message model to lock the fallback direction.

Reports: /Users/phil/workspace/hypaware/.git/dual-review/pr-134

…ref

…ref Resolves the findings from the dual-agent review of this PR. No behavior change — the model expression is untouched; only docs, a test, and annotations are added. - Codex (contract fidelity): make the live-vs-backfill model granularity explicit. Live capture has one model per exchange and stamps it on every row; backfill surfaces the per-line message.model, which the transcript records on assistant lines only, so backfilled user/tool_result rows carry no model (assistant-output-only). Documented in the AiGatewayProjectedMessage and TranscriptEntry doc comments, a new LLP 0026 Consequences bullet, and the backfill test's user-row assertion. - Claude (untested precedence): add an ai-gateway projector test asserting the per-message model wins over a *different* exchange model and falls back to the exchange model when absent — reversing the ?? operands now fails a test. - Claude (missing annotation): @ref LLP 0026 on the transcript producer (transcriptModel, #decision) and the gateway model fallback (#consequences). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

philcunliffe · 2026-06-23T22:59:50Z

Addressed the dual-review findings in 57b0da1 (no behavior change — only docs, a test, and @ref annotations; the model expression is untouched):

Codex — Contract & Interface Fidelity (major): made the live-vs-backfill model granularity explicit rather than changing it. Live capture has one model per exchange and stamps it on every row; backfill surfaces the per-line message.model, which the transcript records on assistant lines only, so backfilled user-prompt/tool_result rows carry no model — backfill model fidelity is assistant-output-only. Documented in the AiGatewayProjectedMessage/TranscriptEntry doc comments, a new LLP 0026 Consequences bullet, and the backfill test's user-row assertion. (Propagating a responding model onto its user turn is ambiguous after a mid-turn switch, so it's deliberately out of scope.)
Claude — untested precedence (minor): added an ai-gateway-message-projector test that drives an exchange whose exchange-level model differs from a message's own model, asserting the per-message model wins and that an absent per-message model falls back to the exchange model. Reversing the ?? operands now fails a test.
Claude — missing annotation (minor): added @ref LLP 0026 on the transcript producer (transcriptModel, #decision) and the gateway model fallback (#consequences).

All touched test files green (45/45). The decision-map was skipped on the prior run because of the major; that's now resolved.

log claude model in backfill

d3229ad

bgmcmullen requested a review from philcunliffe June 23, 2026 19:26

philcunliffe merged commit 80a2215 into master Jun 25, 2026
6 checks passed

philcunliffe deleted the log-claude-model-in-backfill branch June 25, 2026 17:05

philcunliffe mentioned this pull request Jun 25, 2026

Add programmatic integration surface for embedding hosts #144

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Log claude model in backfill#134

Log claude model in backfill#134
philcunliffe merged 2 commits into
masterfrom
log-claude-model-in-backfill

bgmcmullen commented Jun 23, 2026

Uh oh!

philcunliffe commented Jun 23, 2026

Fix Validations

Backfilled Claude assistant rows missing model

Synthetic Claude assistant rows should not emit a fake model

Findings

2) Contract & Interface Fidelity

No Finding

Evidence Bundle

Claude review

Per-message-model decision lacks the mandated `@ref LLP 0026` annotation

Per-message-WINS-over-exchange-model precedence is not actually tested

Uh oh!

philcunliffe commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

bgmcmullen commented Jun 23, 2026

Uh oh!

philcunliffe commented Jun 23, 2026

Dual-agent review — request_changes

Risk capstone

Cross-reference: reviewer findings vs high-risk surfaces

Fix Validations

Backfilled Claude assistant rows missing model

Synthetic Claude assistant rows should not emit a fake model

Findings

2) Contract & Interface Fidelity

No Finding

Evidence Bundle

Claude review

Per-message-model decision lacks the mandated @ref LLP 0026 annotation

Per-message-WINS-over-exchange-model precedence is not actually tested

Uh oh!

philcunliffe commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Dual-agent review — `request_changes`

Per-message-model decision lacks the mandated `@ref LLP 0026` annotation