Skip to content

fix(ocr): replace eval() with ast.literal_eval in deepseek_ocr coordinate parsing#4872

Open
Ricardo-M-L wants to merge 2 commits intoxorbitsai:mainfrom
Ricardo-M-L:fix/deepseek-ocr-eval-rce
Open

fix(ocr): replace eval() with ast.literal_eval in deepseek_ocr coordinate parsing#4872
Ricardo-M-L wants to merge 2 commits intoxorbitsai:mainfrom
Ricardo-M-L:fix/deepseek-ocr-eval-rce

Conversation

@Ricardo-M-L
Copy link
Copy Markdown
Contributor

What

`xinference/model/image/ocr/deepseek_ocr.py` had two `eval()` calls in the coordinate-extraction helpers:

Site Old code
`extract_coordinates_and_label` (L231) `cor_list = eval(ref_text[2])`
`extract_text_blocks` (L399) `coords = eval(f"[{coords_str}]")`

In both, the string passed to `eval()` is a substring extracted from the OCR model's text output. OCR output is LLM-generated and attacker-influenceable through the source document or prompt, so an OCR response shaped like

```
<|ref|>x<|/ref|><|det|>[[import('os').system('id')]]<|/det|>
```

would have its payload executed by the host process when the result is parsed for downstream rendering / block extraction.

Why this matters

Same RCE shape as the merged tool-parser fix #4786 — that PR covered `xinference/model/llm/` but did not reach the image OCR path. Anyone running the deepseek_ocr model on attacker-influenced documents would silently be executing whatever Python the model emits inside `<|det|>...<|/det|>` brackets.

Fix

Replace both calls with `ast.literal_eval`, which only accepts Python literal structures (lists, tuples, numbers, strings, …) and refuses calls, attribute access, and imports. The expected coordinate strings are list literals like `[[10, 20, 30, 40]]`, which `literal_eval` parses identically to `eval`, so behavior on legitimate input is unchanged.

Exception handling is also tightened from a bare `except Exception` to the specific `ValueError` / `SyntaxError` that `literal_eval` raises (plus `IndexError` / `TypeError` in the first helper, which indexes `ref_text[1]`/`ref_text[2]`).

```python

Before

cor_list = eval(ref_text[2])

After

OCR output is LLM-generated and attacker-influenceable. literal_eval

only accepts Python literal structures and rejects code execution.

cor_list = ast.literal_eval(ref_text[2])
```

Tests

Adds `xinference/model/image/ocr/tests/test_deepseek_ocr_safe_eval.py` with cases that pin the security guarantee:

  • ✅ Legitimate single and multi coordinate lists still parse.
  • ✅ A payload of `import('pathlib').Path(...).write_text(...)` is rejected with no side effect (verified by checking a tmp_path sentinel file is never created).
  • ✅ Attribute-walking payloads (`().class.bases[0]...`) are rejected.
  • ✅ Malformed truncated input does not crash callers.
  • ✅ A malicious block followed by a valid block in the same OCR response still allows the valid block to surface (parser does not abort on a bad block).

The test module `pytest.importorskip`s `torch` / `PIL` / `torchvision` so it is safely skipped in environments without the OCR runtime, but runs fully in CI.

Notes

`ruff check` passes on all touched files.

…nate parsing

`xinference/model/image/ocr/deepseek_ocr.py` had two `eval()` calls in
the coordinate-extraction helpers — `extract_coordinates_and_label`
(line 231) and `extract_text_blocks` (line 399). Both `eval()` their
input directly:

```python
cor_list = eval(ref_text[2])                  # ref_text from OCR output
coords   = eval(f"[{coords_str}]")            # coords_str from OCR output
```

The argument to each `eval()` is a substring extracted from the OCR
model's text output. Because OCR output is LLM-generated and
attacker-influenceable through the source document or prompt, an OCR
response of the form

```
<|ref|>x<|/ref|><|det|>[[__import__('os').system('id')]]<|/det|>
```

would have its payload **executed by the host process** when the result
is parsed for downstream rendering / block extraction.

Replace both calls with `ast.literal_eval`, which only accepts Python
literal structures (lists, tuples, numbers, strings, …) and refuses
calls, attribute access, and imports. The expected coordinate strings
are list literals like `[[10, 20, 30, 40]]`, which `literal_eval`
parses identically to `eval`, so behavior on legitimate input is
unchanged. Exception handling is tightened to the specific
`ValueError` / `SyntaxError` that `literal_eval` raises (plus
`IndexError`/`TypeError` in the first helper because it indexes
`ref_text[1]`/`ref_text[2]`).

This is the same fix shape as the merged tool-parser PR xorbitsai#4786 — that
one covered `xinference/model/llm/`; this PR covers the image OCR
path that xorbitsai#4786 didn't reach.

Adds `xinference/model/image/ocr/tests/test_deepseek_ocr_safe_eval.py`
with cases that pin the security guarantee:

- Legitimate single and multi coordinate lists still parse.
- A payload of `__import__('pathlib').Path(...).write_text(...)` is
  rejected with no side effect (verified by checking a tmp_path
  sentinel file is never created).
- Attribute-walking payloads (`().__class__.__bases__[0]...`) are
  rejected.
- Malformed truncated input does not crash callers.
- A malicious block followed by a valid block in the same OCR
  response still allows the valid block to surface.
@XprobeBot XprobeBot added the bug Something isn't working label May 4, 2026
@XprobeBot XprobeBot added this to the v2.x milestone May 4, 2026
@qinxuye
Copy link
Copy Markdown
Contributor

qinxuye commented May 5, 2026

Tests failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants