Skip to content

feat: Add uspto backend meta-data extraction#2284

Closed
vku-ibm wants to merge 1 commit intomainfrom
vku/uspto_meta
Closed

feat: Add uspto backend meta-data extraction#2284
vku-ibm wants to merge 1 commit intomainfrom
vku/uspto_meta

Conversation

@vku-ibm
Copy link
Copy Markdown
Member

@vku-ibm vku-ibm commented Sep 18, 2025

Adds extraction of the meta-data for uspto-backend that handles parsing of uspto patents in xml form.

Issue resolved by this Pull Request:
Resolves #2273

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

@github-actions
Copy link
Copy Markdown
Contributor

DCO Check Passed

Thanks @vku-ibm, all your commits are properly signed off. 🎉

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Sep 18, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

def supported_formats(cls) -> Set["InputFormat"]:
pass

@abstractmethod
Copy link
Copy Markdown
Member

@cau-git cau-git Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This @abstractmethod tag should not be necessary since you are providing a default implementation.

@PeterStaar-IBM
Copy link
Copy Markdown
Member

@vku-ibm This is a really good addition! Could you, for the PDF pipelines,

  1. Extract the meta-data through the docling-parse metadata extraction methods
  2. Add the Table-of-contents from the pdf (if it has any) via docling-parse?

@apupier
Copy link
Copy Markdown

apupier commented Jan 22, 2026

@vku-ibm Do you plan to work again on this topic?

@vku-ibm
Copy link
Copy Markdown
Member Author

vku-ibm commented Jan 23, 2026

@vku-ibm Do you plan to work again on this topic?

@apupier not at the moment. Its a large set of work that requires handling specific tags, for different versions and types of USPTO documents. We did this work in past, but not in the scope of docling.

@PeterStaar-IBM
Copy link
Copy Markdown
Member

I will close this for now,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metadata in ConversionResult

4 participants