[Feat] Add PaddleOCRLoader for local OCR LangChain integration by Ihebdhouibi · Pull Request #17813 · PaddlePaddle/PaddleOCR

Ihebdhouibi · 2026-03-15T01:09:35Z

Add PaddleOCRLoader that wraps the local PaddleOCR library (PP-OCRv5 and PP-StructureV3) to produce LangChain Document objects.

New files:

langchain_paddleocr/document_loaders/paddleocr.py: PaddleOCRLoader, PaddleOCRConfig dataclass, custom exception hierarchy
tests/unit_tests/document_loaders/test_paddleocr_loader.py: 29 unit tests
tests/integration_tests/document_loaders/test_paddleocr_loader.py: Integration tests

Modified files:

langchain_paddleocr/init.py: Add PaddleOCRLoader export (lazy import for PaddleOCRVLLoader)
langchain_paddleocr/document_loaders/init.py: Same
README.md / README_cn.md: Add PaddleOCRLoader usage docs

paddle-bot · 2026-03-15T01:09:41Z

Thanks for your contribution!

Bobholamovic

Thank you very much for your contribution! I'd like to clarify a point to avoid any potential confusion: within the project, PaddleOCR typically refers to the general OCR capability, whereas the PP-Structure series is intended for more complex document parsing. Since they differ quite a bit in terms of scope and design goals, it might be better to keep them separated. From an architectural perspective, I would gently suggest not combining them into a single class, so that the design remains clearer and easier to maintain.

Ihebdhouibi · 2026-03-20T06:14:06Z

Thank you very much for your contribution! I'd like to clarify a point to avoid any potential confusion: within the project, PaddleOCR typically refers to the general OCR capability, whereas the PP-Structure series is intended for more complex document parsing. Since they differ quite a bit in terms of scope and design goals, it might be better to keep them separated. From an architectural perspective, I would gently suggest not combining them into a single class, so that the design remains clearer and easier to maintain.

Absolutely spot on, I'll fix that and update the PR

Add PaddleOCRLoader that wraps the local PaddleOCR library (PP-OCRv5 and PP-StructureV3) to produce LangChain Document objects without requiring any cloud API. New files: - langchain_paddleocr/document_loaders/paddleocr.py: PaddleOCRLoader, PaddleOCRConfig dataclass, custom exception hierarchy - tests/unit_tests/document_loaders/test_paddleocr_loader.py: 29 unit tests - tests/integration_tests/document_loaders/test_paddleocr_loader.py: Integration tests Modified files: - langchain_paddleocr/__init__.py: Add PaddleOCRLoader export (lazy import for PaddleOCRVLLoader) - langchain_paddleocr/document_loaders/__init__.py: Same - README.md / README_cn.md: Add PaddleOCRLoader usage docs

Ihebdhouibi · 2026-04-03T13:44:45Z

@Bobholamovic Changes done as required, pending review

paddle-bot bot added the contributor label Mar 15, 2026

luotao1 assigned luotao1 and Bobholamovic Mar 18, 2026

Bobholamovic reviewed Mar 19, 2026

View reviewed changes

Ihebdhouibi force-pushed the feat/paddleocr-loader branch from d4cb412 to 433f158 Compare March 21, 2026 11:11

Ihebdhouibi added 3 commits March 26, 2026 13:47

Merge branch 'main' into feat/paddleocr-loader

399fd30

Merge branch 'main' into feat/paddleocr-loader

ecafc23

Merge branch 'main' into feat/paddleocr-loader

03c82fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Add PaddleOCRLoader for local OCR LangChain integration#17813

[Feat] Add PaddleOCRLoader for local OCR LangChain integration#17813
Ihebdhouibi wants to merge 4 commits intoPaddlePaddle:mainfrom
Ihebdhouibi:feat/paddleocr-loader

Ihebdhouibi commented Mar 15, 2026

Uh oh!

paddle-bot bot commented Mar 15, 2026

Uh oh!

Bobholamovic left a comment

Uh oh!

Ihebdhouibi commented Mar 20, 2026

Uh oh!

Ihebdhouibi commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Ihebdhouibi commented Mar 15, 2026

Uh oh!

paddle-bot bot commented Mar 15, 2026

Uh oh!

Bobholamovic left a comment

Choose a reason for hiding this comment

Uh oh!

Ihebdhouibi commented Mar 20, 2026

Uh oh!

Ihebdhouibi commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants