Skip to content

feat: OCR as enrichment for pictures in simple pipeline (docx, pptx, html, etc)#2488

Closed
dolfim-ibm wants to merge 1 commit intomainfrom
ocr-enrichment
Closed

feat: OCR as enrichment for pictures in simple pipeline (docx, pptx, html, etc)#2488
dolfim-ibm wants to merge 1 commit intomainfrom
ocr-enrichment

Conversation

@dolfim-ibm
Copy link
Copy Markdown
Member

This PR allows to run the OCR step also in the pictures found in the documents converted with the SimplePipeline, e.g. docx, pptx, html, etc.

Unfinished work TODO

  • actually call the OCR model
  • each OCR models is currently implementing its logic in the call method. For this feature to work it should be better to refactor and decouple some components

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

@mergify
Copy link
Copy Markdown

mergify bot commented Oct 17, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@github-actions
Copy link
Copy Markdown
Contributor

DCO Check Passed

Thanks @dolfim-ibm, all your commits are properly signed off. 🎉

@codecov
Copy link
Copy Markdown

codecov bot commented Oct 17, 2025

Codecov Report

❌ Patch coverage is 90.62500% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
docling/models/ocr_enrichment.py 89.28% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

@mergify
Copy link
Copy Markdown

mergify bot commented Feb 13, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

1 similar comment
@mergify
Copy link
Copy Markdown

mergify bot commented Feb 21, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@dolfim-ibm
Copy link
Copy Markdown
Member Author

Not needed now, and new approaches will do it differently.

@dolfim-ibm dolfim-ibm closed this Mar 6, 2026
@dolfim-ibm dolfim-ibm deleted the ocr-enrichment branch March 6, 2026 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant