fix(ocr): replace eval() with ast.literal_eval in deepseek_ocr coordinate parsing#4872
Open
Ricardo-M-L wants to merge 2 commits intoxorbitsai:mainfrom
Open
fix(ocr): replace eval() with ast.literal_eval in deepseek_ocr coordinate parsing#4872Ricardo-M-L wants to merge 2 commits intoxorbitsai:mainfrom
Ricardo-M-L wants to merge 2 commits intoxorbitsai:mainfrom
Conversation
…nate parsing
`xinference/model/image/ocr/deepseek_ocr.py` had two `eval()` calls in
the coordinate-extraction helpers — `extract_coordinates_and_label`
(line 231) and `extract_text_blocks` (line 399). Both `eval()` their
input directly:
```python
cor_list = eval(ref_text[2]) # ref_text from OCR output
coords = eval(f"[{coords_str}]") # coords_str from OCR output
```
The argument to each `eval()` is a substring extracted from the OCR
model's text output. Because OCR output is LLM-generated and
attacker-influenceable through the source document or prompt, an OCR
response of the form
```
<|ref|>x<|/ref|><|det|>[[__import__('os').system('id')]]<|/det|>
```
would have its payload **executed by the host process** when the result
is parsed for downstream rendering / block extraction.
Replace both calls with `ast.literal_eval`, which only accepts Python
literal structures (lists, tuples, numbers, strings, …) and refuses
calls, attribute access, and imports. The expected coordinate strings
are list literals like `[[10, 20, 30, 40]]`, which `literal_eval`
parses identically to `eval`, so behavior on legitimate input is
unchanged. Exception handling is tightened to the specific
`ValueError` / `SyntaxError` that `literal_eval` raises (plus
`IndexError`/`TypeError` in the first helper because it indexes
`ref_text[1]`/`ref_text[2]`).
This is the same fix shape as the merged tool-parser PR xorbitsai#4786 — that
one covered `xinference/model/llm/`; this PR covers the image OCR
path that xorbitsai#4786 didn't reach.
Adds `xinference/model/image/ocr/tests/test_deepseek_ocr_safe_eval.py`
with cases that pin the security guarantee:
- Legitimate single and multi coordinate lists still parse.
- A payload of `__import__('pathlib').Path(...).write_text(...)` is
rejected with no side effect (verified by checking a tmp_path
sentinel file is never created).
- Attribute-walking payloads (`().__class__.__bases__[0]...`) are
rejected.
- Malformed truncated input does not crash callers.
- A malicious block followed by a valid block in the same OCR
response still allows the valid block to surface.
Contributor
|
Tests failed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
`xinference/model/image/ocr/deepseek_ocr.py` had two `eval()` calls in the coordinate-extraction helpers:
In both, the string passed to `eval()` is a substring extracted from the OCR model's text output. OCR output is LLM-generated and attacker-influenceable through the source document or prompt, so an OCR response shaped like
```
<|ref|>x<|/ref|><|det|>[[import('os').system('id')]]<|/det|>
```
would have its payload executed by the host process when the result is parsed for downstream rendering / block extraction.
Why this matters
Same RCE shape as the merged tool-parser fix #4786 — that PR covered `xinference/model/llm/` but did not reach the image OCR path. Anyone running the deepseek_ocr model on attacker-influenced documents would silently be executing whatever Python the model emits inside `<|det|>...<|/det|>` brackets.
Fix
Replace both calls with `ast.literal_eval`, which only accepts Python literal structures (lists, tuples, numbers, strings, …) and refuses calls, attribute access, and imports. The expected coordinate strings are list literals like `[[10, 20, 30, 40]]`, which `literal_eval` parses identically to `eval`, so behavior on legitimate input is unchanged.
Exception handling is also tightened from a bare `except Exception` to the specific `ValueError` / `SyntaxError` that `literal_eval` raises (plus `IndexError` / `TypeError` in the first helper, which indexes `ref_text[1]`/`ref_text[2]`).
```python
Before
cor_list = eval(ref_text[2])
After
OCR output is LLM-generated and attacker-influenceable. literal_eval
only accepts Python literal structures and rejects code execution.
cor_list = ast.literal_eval(ref_text[2])
```
Tests
Adds `xinference/model/image/ocr/tests/test_deepseek_ocr_safe_eval.py` with cases that pin the security guarantee:
The test module `pytest.importorskip`s `torch` / `PIL` / `torchvision` so it is safely skipped in environments without the OCR runtime, but runs fully in CI.
Notes
`ruff check` passes on all touched files.