Skip to content

Log claude model in backfill#134

Merged
philcunliffe merged 2 commits into
masterfrom
log-claude-model-in-backfill
Jun 25, 2026
Merged

Log claude model in backfill#134
philcunliffe merged 2 commits into
masterfrom
log-claude-model-in-backfill

Conversation

@bgmcmullen

Copy link
Copy Markdown
Contributor

Our backfilled claude logs were missing the model. This fixes it.

@bgmcmullen bgmcmullen requested a review from philcunliffe June 23, 2026 19:26
@philcunliffe

Copy link
Copy Markdown
Contributor

Dual-agent review — request_changes

  • Verdict: request_changes
  • Risk class: medium
  • Auto-merge advisory: 👎 thumbs down — verdict is request_changes; needs human-gated follow-up

Advisory only: no merge was attempted. Reviewed in an isolated worktree at the PR head; your working tree was untouched.

Note on the verdict: the single major is Codex's contract-fidelity finding (backfilled user/request rows stay model-less while live capture stamps the exchange model on them). Both independent Claude reviewers traced this as a pre-existing divergence (backfill never carried an exchange-level model) that this PR does not introduce or worsen — the change strictly improves assistant-row accuracy. The mechanical rubric still counts a category 1–8 major as request_changes; treat it as "confirm the user-row policy is intended," not "the code is wrong." Both remaining Claude findings are minor.

Risk capstone

Cross-reference: reviewer findings vs high-risk surfaces

Source Finding (severity, evidence) Intersects
Codex Contract & Interface Fidelity — backfilled user/request rows model-less vs live (major, conf medium; message_projector.js:646/655, claude/types.d.ts:38) Risks → live-vs-backfill divergence; Cross-package usage (claude→ai-gateway contract). Two Claude reviewers independently classified it as pre-existing/by-design, not introduced by this PR.
Claude Missing @ref LLP 0026 annotation (minor, conf 85; message_projector.js:652, transcripts.js:528, backfill.js:394) Targets (all touched symbols); LLP-governance surface, not a behavioral risk.
Claude Per-message-WINS precedence untested (minor, conf 88; claude-backfill.test.js:368, message_projector.js:655) Risks → precedence unverified (direct hit).
Codex review

Fix Validations

Backfilled Claude assistant rows missing model

  • Status: correct
  • Evidence: hypaware-core/plugins-workspace/claude/src/transcripts.js:452, hypaware-core/plugins-workspace/claude/src/backfill.js:394, hypaware-core/plugins-workspace/ai-gateway/src/message_projector.js:652, test/plugins/claude-backfill.test.js:422
  • Assessment: The model is read from message.model, copied onto the projected message, and preferred during gateway row expansion. The regression test covers two assistant rows with different models.

Synthetic Claude assistant rows should not emit a fake model

  • Status: correct
  • Evidence: hypaware-core/plugins-workspace/claude/src/transcripts.js:535, test/plugins/claude-backfill.test.js:424
  • Assessment: <synthetic> is filtered before projection, so it cannot land in the gateway model column.

Findings

2) Contract & Interface Fidelity

  • Severity: major
  • Confidence: medium
  • Evidence: hypaware-core/plugins-workspace/ai-gateway/src/message_projector.js:646, hypaware-core/plugins-workspace/ai-gateway/src/message_projector.js:655, hypaware-core/plugins-workspace/claude/src/types.d.ts:38, test/plugins/claude-backfill.test.js:426
  • Why it matters: Live gateway expansion puts the exchange model on every message-part row, but this PR intentionally leaves backfilled user rows without a model, so backfilled prompt/user rows still diverge from live capture and remain unqueryable by model.
  • Suggested fix: Either propagate the associated assistant response model onto the corresponding backfilled user/request rows, or update the contract/commentary and tests to state that backfill model fidelity is assistant-output-only.

No Finding

  1. Behavioral Correctness
  2. Change Impact / Blast Radius
  3. Concurrency, Ordering & State Safety
  4. Error Handling & Resilience
  5. Security Surface
  6. Resource Lifecycle & Cleanup
  7. Release Safety
  8. Test Evidence Quality
  9. Architectural Consistency
  10. Debuggability & Operability

Evidence Bundle

  • Changed hot paths: transcript JSONL parsing, Claude backfill projection, AI gateway message-part expansion, Claude backfill regression tests.
  • Impacted callers: hypaware-core/plugins-workspace/claude/src/transcripts.js:416, hypaware-core/plugins-workspace/claude/src/backfill.js:306, hypaware-core/plugins-workspace/ai-gateway/src/message_projector.js:451.
  • Impacted tests: test/plugins/claude-backfill.test.js:368.
  • Unresolved uncertainty: I did not run the test suite; review is based on the supplied diff plus targeted worktree tracing.
Claude review

Claude review

Per-message-model decision lacks the mandated @ref LLP 0026 annotation

  • Severity: minor
  • Confidence: 85
  • Evidence: hypaware-core/plugins-workspace/ai-gateway/src/message_projector.js:652, hypaware-core/plugins-workspace/claude/src/transcripts.js:528, hypaware-core/plugins-workspace/claude/src/backfill.js:394
  • Why it matters: CLAUDE.md requires // @ref LLP NNNN#anchor on code realizing a non-obvious documented decision; "per-message model wins over the exchange model" and "<synthetic> is a sentinel, drop it" are exactly that (LLP 0026 #decision/#consequences), and every sibling field in these same functions already carries a ref (e.g. message_projector.js:580/663, transcripts.js:187/235), so the omission leaves a gap in the /ref-story rationale view precisely where a governed decision was implemented.
  • Suggested fix: Add // @ref LLP 0026#consequences [implements] — per-message model mirrors the transcript's per-line model; exchange model is the fallback above the model: line in message_projector.js, and // @ref LLP 0026#decision — per-line model; <synthetic> is a local-only sentinel, not a model id above transcriptModel in transcripts.js. Both anchors already exist in llp/0026-claude-native-granularity.decision.md.

Per-message-WINS-over-exchange-model precedence is not actually tested

  • Severity: minor
  • Confidence: 88
  • Evidence: test/plugins/claude-backfill.test.js:368, hypaware-core/plugins-workspace/ai-gateway/src/message_projector.js:655
  • Why it matters: The new test only exercises the case where the exchange-level model is undefined (claude backfill never sets projection.model), so it proves "per-message model surfaces" but not "per-message beats a different exchange model" — reversing the operands to ctx.projection.model ?? stringValue(ctx.message.model) would leave every assertion green, so the most load-bearing claim in the diff (asserted in three comments and the .d.ts) is unverified.
  • Suggested fix: Add one assertion where projection.model differs from the per-message model — e.g. a direct expandMessageParts/aiGatewayRowsFromProjectedExchange unit test with projection.model = 'exchange-model' and message.model = 'msg-model' asserting row.model === 'msg-model', plus an assistant row with no per-message model to lock the fallback direction.

Reports: /Users/phil/workspace/hypaware/.git/dual-review/pr-134

…ref

Resolves the findings from the dual-agent review of this PR. No behavior
change — the model expression is untouched; only docs, a test, and
annotations are added.

- Codex (contract fidelity): make the live-vs-backfill model granularity
  explicit. Live capture has one model per exchange and stamps it on every
  row; backfill surfaces the per-line message.model, which the transcript
  records on assistant lines only, so backfilled user/tool_result rows carry
  no model (assistant-output-only). Documented in the AiGatewayProjectedMessage
  and TranscriptEntry doc comments, a new LLP 0026 Consequences bullet, and the
  backfill test's user-row assertion.
- Claude (untested precedence): add an ai-gateway projector test asserting the
  per-message model wins over a *different* exchange model and falls back to
  the exchange model when absent — reversing the ?? operands now fails a test.
- Claude (missing annotation): @ref LLP 0026 on the transcript producer
  (transcriptModel, #decision) and the gateway model fallback (#consequences).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@philcunliffe

Copy link
Copy Markdown
Contributor

Addressed the dual-review findings in 57b0da1 (no behavior change — only docs, a test, and @ref annotations; the model expression is untouched):

  • Codex — Contract & Interface Fidelity (major): made the live-vs-backfill model granularity explicit rather than changing it. Live capture has one model per exchange and stamps it on every row; backfill surfaces the per-line message.model, which the transcript records on assistant lines only, so backfilled user-prompt/tool_result rows carry no model — backfill model fidelity is assistant-output-only. Documented in the AiGatewayProjectedMessage/TranscriptEntry doc comments, a new LLP 0026 Consequences bullet, and the backfill test's user-row assertion. (Propagating a responding model onto its user turn is ambiguous after a mid-turn switch, so it's deliberately out of scope.)
  • Claude — untested precedence (minor): added an ai-gateway-message-projector test that drives an exchange whose exchange-level model differs from a message's own model, asserting the per-message model wins and that an absent per-message model falls back to the exchange model. Reversing the ?? operands now fails a test.
  • Claude — missing annotation (minor): added @ref LLP 0026 on the transcript producer (transcriptModel, #decision) and the gateway model fallback (#consequences).

All touched test files green (45/45). The decision-map was skipped on the prior run because of the major; that's now resolved.

@philcunliffe philcunliffe merged commit 80a2215 into master Jun 25, 2026
6 checks passed
@philcunliffe philcunliffe deleted the log-claude-model-in-backfill branch June 25, 2026 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants