Fix: serialize optional pydantic dependency#724
Conversation
|
✔️ Propel has finished reviewing this change. |
There was a problem hiding this comment.
Devin Review found 4 new potential issues.
🐛 4 issues in files not directly in the diff
🐛 Indentation error causes NameError for result_metadata on non-ResultMessage messages (src/judgeval/v1/integrations/claude_agent_sdk/wrapper.py:263-268)
The if result_metadata: block at line 263 is dedented to the async for message loop level (20 spaces) instead of being inside the elif message_type == "ResultMessage": block (24 spaces). This means it executes for every message type. Since result_metadata is only assigned inside the elif message_type == "ResultMessage": branch, when the first message is an AssistantMessage (the typical case), a NameError: name 'result_metadata' is not defined is raised, crashing the Claude Agent SDK integration.
Indentation comparison with old code
The old code at wrapper.py:294-298 had if result_metadata: indented at 24 spaces (inside the elif block). The new code at line 263 has it at 20 spaces (at the for-loop level, outside the elif), making it a separate if statement that runs unconditionally on every loop iteration.
🐛 Race condition: BackgroundQueue._futures set mutated during force_flush iteration (src/judgeval/v1/background_queue.py:61)
force_flush at line 61 passes self._futures (a plain set) directly to concurrent.futures.wait(), which iterates over it. Concurrently, _on_done callbacks (invoked from thread pool threads via future.add_done_callback at background_queue.py:46) call self._futures.discard(future) at line 51, mutating the set. This can cause RuntimeError: Set changed size during iteration. The fix is to snapshot the set (e.g., wait(list(self._futures), ...)) before passing it to wait().
⚠️ JudgmentTracerProvider.get_instance() singleton is not thread-safe (src/judgeval/v1/trace/judgment_tracer_provider.py:109-112)
JudgmentTracerProvider.get_instance() at lines 109-112 uses a check-then-act pattern without synchronization. Two threads calling get_instance() simultaneously when _instance is None can both create separate instances, violating the singleton invariant and potentially causing spans to be lost. Compare with BackgroundQueue.get_instance() at background_queue.py:28-35 which correctly uses a lock with double-checked locking.
⚠️ PromptScorer TypedDict dropped threshold field compared to old version (src/judgeval/v1/internal/api/models/prompt_scorer.py:9-20)
The old PromptScorer TypedDict in the deleted api_types.py (line 260) had a threshold: float field. The new auto-generated src/judgeval/v1/internal/api/models/prompt_scorer.py no longer includes threshold. Code that reads prompt scorers from the API and accesses the threshold field (e.g., the prompt factory or evaluation code) would get a KeyError at runtime. This appears to be a schema generation issue where the field was dropped from the OpenAPI spec or the generator.
View 8 additional findings in Devin Review.
Uh oh!
There was an error while loading. Please reload this page.