Skip to content

test(tika): 🧪 add email alternatives regression test for #494#2324

Merged
dadoonet merged 1 commit intomasterfrom
issue-494-email-alternatives-test
Mar 12, 2026
Merged

test(tika): 🧪 add email alternatives regression test for #494#2324
dadoonet merged 1 commit intomasterfrom
issue-494-email-alternatives-test

Conversation

@dadoonet
Copy link
Owner

@dadoonet dadoonet commented Mar 12, 2026

  • Add sample .eml with multipart/alternative (text/plain + text/html)
  • Add emailIssue494NoDuplicateContent in TikaDocParserTest
  • Assert body text appears only once (Tika 1.17+ single alternative)

Closes #494

Note

Low Risk
Low risk: adds only test fixtures and a JUnit regression test, with no production code changes. Main risk is potential test flakiness if Tika’s extraction behavior changes again across versions.

Overview
Adds a new .eml test fixture representing a multipart/alternative email with both text/plain and text/html bodies.

Extends TikaDocParserTest with a regression test for issue #494 asserting the email body text is extracted only once (no duplicate content from multiple alternatives).

Written by Cursor Bugbot for commit f4b6d04. This will update automatically on new commits. Configure here.

- Add sample .eml with multipart/alternative (text/plain + text/html)
- Add emailIssue494NoDuplicateContent in TikaDocParserTest
- Assert body text appears only once (Tika 1.17+ single alternative)

Closes #494

Made-with: Cursor
@dadoonet dadoonet self-assigned this Mar 12, 2026
@dadoonet dadoonet added test Related to tests only component:extractor For Tika, XML and JSON parsers labels Mar 12, 2026
@sonarqubecloud
Copy link

@dadoonet dadoonet merged commit bc9f90d into master Mar 12, 2026
17 of 18 checks passed
@dadoonet dadoonet deleted the issue-494-email-alternatives-test branch March 12, 2026 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:extractor For Tika, XML and JSON parsers test Related to tests only

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add an option to skip extracting email alternative text

1 participant