feat: OCR as enrichment for pictures in simple pipeline (docx, pptx, html, etc) by dolfim-ibm · Pull Request #2488 · docling-project/docling

dolfim-ibm · 2025-10-17T14:12:03Z

This PR allows to run the OCR step also in the pictures found in the documents converted with the SimplePipeline, e.g. docx, pptx, html, etc.

Unfinished work TODO

actually call the OCR model
each OCR models is currently implementing its logic in the call method. For this feature to work it should be better to refactor and decouple some components

Checklist:

Documentation has been updated, if necessary.
Examples have been added, if necessary.
Tests have been added, if necessary.

Signed-off-by: Michele Dolfi <[email protected]>

mergify · 2025-10-17T14:12:38Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

github-actions · 2025-10-17T14:12:58Z

✅ DCO Check Passed

Thanks @dolfim-ibm, all your commits are properly signed off. 🎉

codecov · 2025-10-17T14:15:45Z

Codecov Report

❌ Patch coverage is 90.62500% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
docling/models/ocr_enrichment.py	89.28%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

mergify · 2026-02-13T18:03:57Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

mergify · 2026-02-21T10:46:14Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

dolfim-ibm · 2026-03-06T08:29:01Z

Not needed now, and new approaches will do it differently.

add ocr as enrichment for pictures in simple pipeline

ee5aedc

Signed-off-by: Michele Dolfi <[email protected]>

dolfim-ibm closed this Mar 6, 2026

dolfim-ibm deleted the ocr-enrichment branch March 6, 2026 08:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: OCR as enrichment for pictures in simple pipeline (docx, pptx, html, etc)#2488

feat: OCR as enrichment for pictures in simple pipeline (docx, pptx, html, etc)#2488
dolfim-ibm wants to merge 1 commit intomainfrom
ocr-enrichment

dolfim-ibm commented Oct 17, 2025

Uh oh!

mergify bot commented Oct 17, 2025

Uh oh!

github-actions bot commented Oct 17, 2025

Uh oh!

codecov bot commented Oct 17, 2025

Uh oh!

mergify bot commented Feb 13, 2026

Uh oh!

mergify bot commented Feb 21, 2026

Uh oh!

dolfim-ibm commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dolfim-ibm commented Oct 17, 2025

Uh oh!

mergify bot commented Oct 17, 2025

Merge Protections

🟢 Enforce conventional commit

Uh oh!

github-actions bot commented Oct 17, 2025

Uh oh!

codecov bot commented Oct 17, 2025

Codecov Report

Uh oh!

mergify bot commented Feb 13, 2026

Merge Protections

🟢 Enforce conventional commit

Uh oh!

mergify bot commented Feb 21, 2026

Merge Protections

🟢 Enforce conventional commit

Uh oh!

dolfim-ibm commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant