Skip to content

Fix: serialize optional pydantic dependency#724

Merged
abhishekg999 merged 2 commits intostagingfrom
ahh/serialize-fix
Mar 25, 2026
Merged

Fix: serialize optional pydantic dependency#724
abhishekg999 merged 2 commits intostagingfrom
ahh/serialize-fix

Conversation

@abhishekg999
Copy link
Contributor

@abhishekg999 abhishekg999 commented Mar 25, 2026

@abhishekg999 abhishekg999 changed the base branch from main to staging March 25, 2026 18:03
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 4 potential issues.

View 6 additional findings in Devin Review.

Open in Devin Review

Copy link

@propel-code-bot propel-code-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review found no issues; changes look good.

Status: No Issues Found | Risk: Low

Review Details

📁 1 files reviewed | 💬 0 comments

@propel-code-bot
Copy link

✔️ Propel has finished reviewing this change.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 4 new potential issues.

🐛 4 issues in files not directly in the diff

🐛 Indentation error causes NameError for result_metadata on non-ResultMessage messages (src/judgeval/v1/integrations/claude_agent_sdk/wrapper.py:263-268)

The if result_metadata: block at line 263 is dedented to the async for message loop level (20 spaces) instead of being inside the elif message_type == "ResultMessage": block (24 spaces). This means it executes for every message type. Since result_metadata is only assigned inside the elif message_type == "ResultMessage": branch, when the first message is an AssistantMessage (the typical case), a NameError: name 'result_metadata' is not defined is raised, crashing the Claude Agent SDK integration.

Indentation comparison with old code

The old code at wrapper.py:294-298 had if result_metadata: indented at 24 spaces (inside the elif block). The new code at line 263 has it at 20 spaces (at the for-loop level, outside the elif), making it a separate if statement that runs unconditionally on every loop iteration.


🐛 Race condition: BackgroundQueue._futures set mutated during force_flush iteration (src/judgeval/v1/background_queue.py:61)

force_flush at line 61 passes self._futures (a plain set) directly to concurrent.futures.wait(), which iterates over it. Concurrently, _on_done callbacks (invoked from thread pool threads via future.add_done_callback at background_queue.py:46) call self._futures.discard(future) at line 51, mutating the set. This can cause RuntimeError: Set changed size during iteration. The fix is to snapshot the set (e.g., wait(list(self._futures), ...)) before passing it to wait().


⚠️ JudgmentTracerProvider.get_instance() singleton is not thread-safe (src/judgeval/v1/trace/judgment_tracer_provider.py:109-112)

JudgmentTracerProvider.get_instance() at lines 109-112 uses a check-then-act pattern without synchronization. Two threads calling get_instance() simultaneously when _instance is None can both create separate instances, violating the singleton invariant and potentially causing spans to be lost. Compare with BackgroundQueue.get_instance() at background_queue.py:28-35 which correctly uses a lock with double-checked locking.


⚠️ PromptScorer TypedDict dropped threshold field compared to old version (src/judgeval/v1/internal/api/models/prompt_scorer.py:9-20)

The old PromptScorer TypedDict in the deleted api_types.py (line 260) had a threshold: float field. The new auto-generated src/judgeval/v1/internal/api/models/prompt_scorer.py no longer includes threshold. Code that reads prompt scorers from the API and accesses the threshold field (e.g., the prompt factory or evaluation code) would get a KeyError at runtime. This appears to be a schema generation issue where the field was dropped from the OpenAPI spec or the generator.

View 8 additional findings in Devin Review.

Open in Devin Review

@abhishekg999 abhishekg999 merged commit ba0fba2 into staging Mar 25, 2026
18 checks passed
@abhishekg999 abhishekg999 deleted the ahh/serialize-fix branch March 25, 2026 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants