Fix: serialize optional pydantic dependency by abhishekg999 · Pull Request #724 · JudgmentLabs/judgeval

abhishekg999 · 2026-03-25T18:03:07Z

devin-ai-integration

Devin Review found 4 potential issues.

View 6 additional findings in Devin Review.

propel-code-bot

Review found no issues; changes look good.

Status: No Issues Found | Risk: Low

Review Details

📁 1 files reviewed | 💬 0 comments

propel-code-bot · 2026-03-25T18:21:29Z

✔️ Propel has finished reviewing this change.

devin-ai-integration

Devin Review found 4 new potential issues.

🐛 4 issues in files not directly in the diff

🐛 Indentation error causes `NameError` for `result_metadata` on non-ResultMessage messages (`src/judgeval/v1/integrations/claude_agent_sdk/wrapper.py:263-268`)

The if result_metadata: block at line 263 is dedented to the async for message loop level (20 spaces) instead of being inside the elif message_type == "ResultMessage": block (24 spaces). This means it executes for every message type. Since result_metadata is only assigned inside the elif message_type == "ResultMessage": branch, when the first message is an AssistantMessage (the typical case), a NameError: name 'result_metadata' is not defined is raised, crashing the Claude Agent SDK integration.

Indentation comparison with old code

The old code at wrapper.py:294-298 had if result_metadata: indented at 24 spaces (inside the elif block). The new code at line 263 has it at 20 spaces (at the for-loop level, outside the elif), making it a separate if statement that runs unconditionally on every loop iteration.

🐛 Race condition: `BackgroundQueue._futures` set mutated during `force_flush` iteration (`src/judgeval/v1/background_queue.py:61`)

force_flush at line 61 passes self._futures (a plain set) directly to concurrent.futures.wait(), which iterates over it. Concurrently, _on_done callbacks (invoked from thread pool threads via future.add_done_callback at background_queue.py:46) call self._futures.discard(future) at line 51, mutating the set. This can cause RuntimeError: Set changed size during iteration. The fix is to snapshot the set (e.g., wait(list(self._futures), ...)) before passing it to wait().

⚠️ JudgmentTracerProvider.get_instance() singleton is not thread-safe (`src/judgeval/v1/trace/judgment_tracer_provider.py:109-112`)

JudgmentTracerProvider.get_instance() at lines 109-112 uses a check-then-act pattern without synchronization. Two threads calling get_instance() simultaneously when _instance is None can both create separate instances, violating the singleton invariant and potentially causing spans to be lost. Compare with BackgroundQueue.get_instance() at background_queue.py:28-35 which correctly uses a lock with double-checked locking.

⚠️ PromptScorer TypedDict dropped `threshold` field compared to old version (`src/judgeval/v1/internal/api/models/prompt_scorer.py:9-20`)

The old PromptScorer TypedDict in the deleted api_types.py (line 260) had a threshold: float field. The new auto-generated src/judgeval/v1/internal/api/models/prompt_scorer.py no longer includes threshold. Code that reads prompt scorers from the API and accesses the threshold field (e.g., the prompt factory or evaluation code) would get a KeyError at runtime. This appears to be a schema generation issue where the field was dropped from the OpenAPI spec or the generator.

View 8 additional findings in Devin Review.

format

69e2fd3

abhishekg999 requested a review from alanzhang25 March 25, 2026 18:03

abhishekg999 changed the base branch from main to staging March 25, 2026 18:03

devin-ai-integration bot reviewed Mar 25, 2026

View reviewed changes

propel-code-bot bot reviewed Mar 25, 2026

View reviewed changes

Merge branch 'staging' into ahh/serialize-fix

637e77d

devin-ai-integration bot reviewed Mar 25, 2026

View reviewed changes

alanzhang25 approved these changes Mar 25, 2026

View reviewed changes

abhishekg999 merged commit ba0fba2 into staging Mar 25, 2026
18 checks passed

abhishekg999 deleted the ahh/serialize-fix branch March 25, 2026 19:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: serialize optional pydantic dependency#724

Fix: serialize optional pydantic dependency#724
abhishekg999 merged 2 commits intostagingfrom
ahh/serialize-fix

abhishekg999 commented Mar 25, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

propel-code-bot bot left a comment

Uh oh!

propel-code-bot bot commented Mar 25, 2026

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

abhishekg999 commented Mar 25, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

propel-code-bot bot left a comment

Choose a reason for hiding this comment

Uh oh!

propel-code-bot bot commented Mar 25, 2026

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

🐛 Indentation error causes NameError for result_metadata on non-ResultMessage messages (src/judgeval/v1/integrations/claude_agent_sdk/wrapper.py:263-268)

🐛 Race condition: BackgroundQueue._futures set mutated during force_flush iteration (src/judgeval/v1/background_queue.py:61)

⚠️ JudgmentTracerProvider.get_instance() singleton is not thread-safe (src/judgeval/v1/trace/judgment_tracer_provider.py:109-112)

⚠️ PromptScorer TypedDict dropped threshold field compared to old version (src/judgeval/v1/internal/api/models/prompt_scorer.py:9-20)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

abhishekg999 commented Mar 25, 2026 •

edited by devin-ai-integration bot

Loading

🐛 Indentation error causes `NameError` for `result_metadata` on non-ResultMessage messages (`src/judgeval/v1/integrations/claude_agent_sdk/wrapper.py:263-268`)

🐛 Race condition: `BackgroundQueue._futures` set mutated during `force_flush` iteration (`src/judgeval/v1/background_queue.py:61`)

⚠️ JudgmentTracerProvider.get_instance() singleton is not thread-safe (`src/judgeval/v1/trace/judgment_tracer_provider.py:109-112`)

⚠️ PromptScorer TypedDict dropped `threshold` field compared to old version (`src/judgeval/v1/internal/api/models/prompt_scorer.py:9-20`)