fix(tika): 🐛 support Apple Keynote .key file extraction#2325
Merged
fix(tika): 🐛 support Apple Keynote .key file extraction#2325
Conversation
- Add test for Keynote extraction (keynoteIssue782) - Add test.key document and release note in Fix section Closes #782 Made-with: Cursor
|
|
Oh wow - I'd never guessed that I would ever hear back about that. Does this mean Tika can now index Keynote files? That would be awesome. |
Owner
Author
|
Yes but with tesseract according to my tests. |
|
That's really great. Does that also work with both kinds of Keynote/iWork files: bundles and single-file format, where the contents is wrapped in a zip file? |
Owner
Author
|
I just tested with a |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Closes #782
Note
Low Risk
Low risk: changes are limited to documentation and test coverage around Keynote parsing (with optional OCR), with no production parsing logic modified.
Overview
Adds regression coverage for Apple Keynote
(.key)extraction (issue#782) by introducing twoTikaDocParserTestcases: one that asserts slide text is extracted when OCR/Tesseract is available, and one that asserts only package-structure paths are indexed when OCR is disabled.Updates documentation to note
.keysupport in the 2.10 release notes and to clarify informats.rstthat slide text extraction requires OCR to be enabled.Written by Cursor Bugbot for commit 1df778e. This will update automatically on new commits. Configure here.