Skip to content

fix(tika): 🐛 support Apple Keynote .key file extraction#2325

Merged
dadoonet merged 2 commits intomasterfrom
issue-782-keynote-extraction-test
Mar 12, 2026
Merged

fix(tika): 🐛 support Apple Keynote .key file extraction#2325
dadoonet merged 2 commits intomasterfrom
issue-782-keynote-extraction-test

Conversation

@dadoonet
Copy link
Owner

@dadoonet dadoonet commented Mar 12, 2026

  • Add test for Keynote extraction (keynoteIssue782)
  • Add test.key document and release note in Fix section

Closes #782


Note

Low Risk
Low risk: changes are limited to documentation and test coverage around Keynote parsing (with optional OCR), with no production parsing logic modified.

Overview
Adds regression coverage for Apple Keynote (.key) extraction (issue #782) by introducing two TikaDocParserTest cases: one that asserts slide text is extracted when OCR/Tesseract is available, and one that asserts only package-structure paths are indexed when OCR is disabled.

Updates documentation to note .key support in the 2.10 release notes and to clarify in formats.rst that slide text extraction requires OCR to be enabled.

Written by Cursor Bugbot for commit 1df778e. This will update automatically on new commits. Configure here.

- Add test for Keynote extraction (keynoteIssue782)
- Add test.key document and release note in Fix section

Closes #782

Made-with: Cursor
@dadoonet dadoonet self-assigned this Mar 12, 2026
@dadoonet dadoonet added test Related to tests only component:extractor For Tika, XML and JSON parsers labels Mar 12, 2026
@sonarqubecloud
Copy link

@dadoonet dadoonet merged commit bd99eaa into master Mar 12, 2026
18 checks passed
@dadoonet dadoonet deleted the issue-782-keynote-extraction-test branch March 12, 2026 20:36
@budachst
Copy link

Oh wow - I'd never guessed that I would ever hear back about that. Does this mean Tika can now index Keynote files? That would be awesome.

@dadoonet
Copy link
Owner Author

Yes but with tesseract according to my tests.

@budachst
Copy link

That's really great. Does that also work with both kinds of Keynote/iWork files: bundles and single-file format, where the contents is wrapped in a zip file?

@dadoonet
Copy link
Owner Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:extractor For Tika, XML and JSON parsers test Related to tests only

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Indexing Apple keynote files doesn't seem to work

2 participants