Skip to content

fix: handle ValidationError from JSON pattern false positives in converter#5576

Open
VANDRANKI wants to merge 1 commit intocrewAIInc:mainfrom
VANDRANKI:fix/converter-json-pattern-validation-error
Open

fix: handle ValidationError from JSON pattern false positives in converter#5576
VANDRANKI wants to merge 1 commit intocrewAIInc:mainfrom
VANDRANKI:fix/converter-json-pattern-validation-error

Conversation

@VANDRANKI
Copy link
Copy Markdown

Fixes #5460.

Bug

_JSON_PATTERN uses a greedy {.*} regex with re.DOTALL, which can match any content that contains curly braces - including GraphQL schemas, template strings, and similar non-JSON text. When such a false-positive match happens to survive json.JSONDecodeError (because the outer structure looks valid) but then fails Pydantic model validation, the ValidationError was re-raised directly, crashing the task.

Changes

  1. Regex: greedy to non-greedy - {.*} becomes {.*?} to prefer the shortest match, reducing the chance of gobbling up large non-JSON blocks.

  2. ValidationError: raise to pass - if model_validate_json fails schema validation on a regex match, the match was likely a false positive. Falling through to convert_with_instructions lets the LLM reformat the output correctly instead of crashing.

Why this is safe

convert_with_instructions is the right fallback for any output the regex misidentifies - it re-prompts the LLM to produce the correct schema. This is already the path taken for json.JSONDecodeError, so treating ValidationError the same way is consistent.

…erter

The _JSON_PATTERN regex uses a greedy {.*} match (DOTALL) that can
match any curly-brace content, including GraphQL schemas, template
strings, and other non-JSON text. When such a false-positive match
passes json.JSONDecodeError (it looks like valid JSON) but then fails
Pydantic model validation, the ValidationError was re-raised directly,
crashing the task instead of falling through to convert_with_instructions.

Two changes:
- Make the regex non-greedy ({.*?}) to prefer the shortest match.
- Catch ValidationError with pass instead of raise so that false-positive
  regex matches fall through to the LLM-based converter, which handles
  the output correctly.

Fixes crewAIInc#5460

Co-Authored-By: kalfa <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] when using Task(..., output_pydantic=MyModel) some JSON substring identification within the data occurs with possible false positive

1 participant