fix(ocr): replace eval() with ast.literal_eval in deepseek_ocr coordinate parsing by Ricardo-M-L · Pull Request #4872 · xorbitsai/inference

Ricardo-M-L · 2026-05-04T16:21:24Z

What

`xinference/model/image/ocr/deepseek_ocr.py` had two `eval()` calls in the coordinate-extraction helpers:

Site	Old code
`extract_coordinates_and_label` (L231)	`cor_list = eval(ref_text[2])`
`extract_text_blocks` (L399)	`coords = eval(f"[{coords_str}]")`

In both, the string passed to `eval()` is a substring extracted from the OCR model's text output. OCR output is LLM-generated and attacker-influenceable through the source document or prompt, so an OCR response shaped like

```
<|ref|>x<|/ref|><|det|>[[import('os').system('id')]]<|/det|>
```

would have its payload executed by the host process when the result is parsed for downstream rendering / block extraction.

Why this matters

Same RCE shape as the merged tool-parser fix #4786 — that PR covered `xinference/model/llm/` but did not reach the image OCR path. Anyone running the deepseek_ocr model on attacker-influenced documents would silently be executing whatever Python the model emits inside `<|det|>...<|/det|>` brackets.

Fix

Replace both calls with `ast.literal_eval`, which only accepts Python literal structures (lists, tuples, numbers, strings, …) and refuses calls, attribute access, and imports. The expected coordinate strings are list literals like `[[10, 20, 30, 40]]`, which `literal_eval` parses identically to `eval`, so behavior on legitimate input is unchanged.

Exception handling is also tightened from a bare `except Exception` to the specific `ValueError` / `SyntaxError` that `literal_eval` raises (plus `IndexError` / `TypeError` in the first helper, which indexes `ref_text[1]`/`ref_text[2]`).

```python

Before

cor_list = eval(ref_text[2])

After

OCR output is LLM-generated and attacker-influenceable. literal_eval

only accepts Python literal structures and rejects code execution.

cor_list = ast.literal_eval(ref_text[2])
```

Tests

Adds `xinference/model/image/ocr/tests/test_deepseek_ocr_safe_eval.py` with cases that pin the security guarantee:

✅ Legitimate single and multi coordinate lists still parse.
✅ A payload of `import('pathlib').Path(...).write_text(...)` is rejected with no side effect (verified by checking a tmp_path sentinel file is never created).
✅ Attribute-walking payloads (`().class.bases[0]...`) are rejected.
✅ Malformed truncated input does not crash callers.
✅ A malicious block followed by a valid block in the same OCR response still allows the valid block to surface (parser does not abort on a bad block).

The test module `pytest.importorskip`s `torch` / `PIL` / `torchvision` so it is safely skipped in environments without the OCR runtime, but runs fully in CI.

Notes

Same fix shape as merged PR fix: replace eval() with safe alternatives to prevent RCE in tool parsers #4786 (`fix: replace eval() with safe alternatives to prevent RCE in tool parsers`). This PR closes the remaining `eval()` site that fix: replace eval() with safe alternatives to prevent RCE in tool parsers #4786 didn't reach.
No external behavior change — `literal_eval` is a strict subset of `eval` for the expected inputs.

`ruff check` passes on all touched files.

…nate parsing `xinference/model/image/ocr/deepseek_ocr.py` had two `eval()` calls in the coordinate-extraction helpers — `extract_coordinates_and_label` (line 231) and `extract_text_blocks` (line 399). Both `eval()` their input directly: ```python cor_list = eval(ref_text[2]) # ref_text from OCR output coords = eval(f"[{coords_str}]") # coords_str from OCR output ``` The argument to each `eval()` is a substring extracted from the OCR model's text output. Because OCR output is LLM-generated and attacker-influenceable through the source document or prompt, an OCR response of the form ``` <|ref|>x<|/ref|><|det|>[[__import__('os').system('id')]]<|/det|> ``` would have its payload **executed by the host process** when the result is parsed for downstream rendering / block extraction. Replace both calls with `ast.literal_eval`, which only accepts Python literal structures (lists, tuples, numbers, strings, …) and refuses calls, attribute access, and imports. The expected coordinate strings are list literals like `[[10, 20, 30, 40]]`, which `literal_eval` parses identically to `eval`, so behavior on legitimate input is unchanged. Exception handling is tightened to the specific `ValueError` / `SyntaxError` that `literal_eval` raises (plus `IndexError`/`TypeError` in the first helper because it indexes `ref_text[1]`/`ref_text[2]`). This is the same fix shape as the merged tool-parser PR xorbitsai#4786 — that one covered `xinference/model/llm/`; this PR covers the image OCR path that xorbitsai#4786 didn't reach. Adds `xinference/model/image/ocr/tests/test_deepseek_ocr_safe_eval.py` with cases that pin the security guarantee: - Legitimate single and multi coordinate lists still parse. - A payload of `__import__('pathlib').Path(...).write_text(...)` is rejected with no side effect (verified by checking a tmp_path sentinel file is never created). - Attribute-walking payloads (`().__class__.__bases__[0]...`) are rejected. - Malformed truncated input does not crash callers. - A malicious block followed by a valid block in the same OCR response still allows the valid block to surface.

qinxuye · 2026-05-05T11:39:39Z

Tests failed.

XprobeBot added the bug Something isn't working label May 4, 2026

XprobeBot added this to the v2.x milestone May 4, 2026

style: apply black formatting to test_deepseek_ocr_safe_eval.py

998e881

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ocr): replace eval() with ast.literal_eval in deepseek_ocr coordinate parsing#4872

fix(ocr): replace eval() with ast.literal_eval in deepseek_ocr coordinate parsing#4872
Ricardo-M-L wants to merge 2 commits intoxorbitsai:mainfrom
Ricardo-M-L:fix/deepseek-ocr-eval-rce

Ricardo-M-L commented May 4, 2026

Uh oh!

qinxuye commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Ricardo-M-L commented May 4, 2026

What

Why this matters

Fix

Before

After

OCR output is LLM-generated and attacker-influenceable. literal_eval

only accepts Python literal structures and rejects code execution.

Tests

Notes

Uh oh!

qinxuye commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants