Skip to content

fix(cbz-reader): CBZ reading regression and optimize ZIP parsing strategy#2593

Open
balazs-szucs wants to merge 5 commits intobooklore-app:developfrom
balazs-szucs:cbx-unicode
Open

fix(cbz-reader): CBZ reading regression and optimize ZIP parsing strategy#2593
balazs-szucs wants to merge 5 commits intobooklore-app:developfrom
balazs-szucs:cbx-unicode

Conversation

@balazs-szucs
Copy link
Collaborator

🚀 Pull Request

📝 Description

Closes: #2589

Fixed a regression in Booklore 1.18.5 where some CBZ files could not be opened due to strict parsing of Unicode Extra Fields in the updated commons-compress library.

🛠️ Changes Implemented

  • CbxReaderService: If standard ZIP parsing (with Unicode Extra Fields) fails, the system now automatically retries reading the archive without Unicode Extra Fields. This ensures compatibility with archives containing malformed extra fields.
  • Optimized Encoding Iteration: Refactored the scanning logic to iterate through all parameter combinations (Fast/Slow path × Unicode Enabled/Disabled) per encoding before moving to the next encoding. This is significantly more efficient than the previous strategy of trying all encodings with one setting and then restarting.
  • CachedArchiveMetadata: to store the successful useUnicodeExtraFields setting along with the encoding, ensuring that subsequent file access (like streaming pages) uses the proven settings immediately.

🧪 Testing Strategy

📸 Visual Changes (if applicable)


⚠️ Required Pre-Submission Checklist

Please Read - This Checklist is Mandatory

Important Notice: We've experienced several production bugs recently due to incomplete pre-submission checks. To maintain code quality and prevent issues from reaching production, we're enforcing stricter adherence to this checklist.

All checkboxes below must be completed before requesting review. PRs that haven't completed these requirements will be sent back for completion.

Mandatory Requirements (please check ALL boxes):

  • Code adheres to project style guidelines and conventions
  • Branch synchronized with latest develop branch (please resolve any merge conflicts)
  • 🚨 CRITICAL: Automated unit tests added/updated to cover changes (MANDATORY for ALL Spring Boot backend and Angular frontend changes - this is non-negotiable)
  • 🚨 CRITICAL: All tests pass locally (run ./gradlew test for Spring Boot backend, and ng test for Angular frontend - NO EXCEPTIONS)
  • 🚨 CRITICAL: Manual testing completed in local development environment (verify your changes work AND no existing functionality is broken - test related features thoroughly)
  • Flyway migration versioning follows correct sequence (if database schema was modified)
  • Documentation PR submitted to booklore-docs (required for features or enhancements that introduce user-facing or visual changes)

Why This Matters:

Recent production incidents have been traced back to:

  • Incomplete testing coverage (especially backend)
  • Merge conflicts not resolved before merge
  • Missing documentation for new features

Backend changes without tests will not be accepted. By completing this checklist thoroughly, you're helping maintain the quality and stability of Booklore for all users.

Note to Reviewers: Please verify the checklist is complete before beginning your review. If items are unchecked, kindly ask the contributor to complete them first.


💬 Additional Context (optional)

@balazs-szucs balazs-szucs changed the title fix(unzip): CBZ reading regression and optimize ZIP parsing strategy fix(cbz-reader): CBZ reading regression and optimize ZIP parsing strategy Feb 3, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a CBZ reading regression introduced by stricter Unicode Extra Field parsing in commons-compress by adding a fallback ZIP parsing mode (disabling Unicode Extra Fields), optimizing the ZIP strategy iteration order, and caching the successful ZIP settings for subsequent reads/streams.

Changes:

  • Add ZIP fallback logic to retry without Unicode Extra Fields when standard parsing fails.
  • Refactor ZIP strategy iteration to try (Fast/Slow × Unicode on/off) per encoding for better efficiency.
  • Extend cached archive metadata to remember the successful useUnicodeExtraFields setting; add a unit test covering the fallback path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
booklore-api/src/main/java/org/booklore/service/reader/CbxReaderService.java Adds Unicode Extra Fields fallback, strategy iteration refactor, and caches useUnicodeExtraFields for streaming.
booklore-api/src/test/java/org/booklore/service/reader/CbxReaderServiceTest.java Adds a test validating the CBZ fallback behavior when Unicode Extra Fields parsing fails.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +309 to +310
} catch (Exception e) {
log.trace("Stream strategy failed ({}, Fast={}, Unicode={}): {}", charset, useFastPath, useUnicode, e.getMessage());
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tryStreamEntry catches all Exception, including IOException from reading the ZIP entry or writing to outputStream (e.g., client disconnect). Swallowing those and continuing can lead to misleading errors (eventually “Unable to find entry…”) and potential repeated/partial writes. Consider only catching the ZIP parsing/charset-related exceptions you expect here, and let IOException propagate (or rethrow it).

Suggested change
} catch (Exception e) {
log.trace("Stream strategy failed ({}, Fast={}, Unicode={}): {}", charset, useFastPath, useUnicode, e.getMessage());
} catch (IOException e) {
// I/O problems (e.g., client disconnect, read/write failures) should abort the operation
throw e;
} catch (Exception e) {
// Non-I/O problems (e.g., charset/ZIP parsing issues) mean this strategy failed; try the next one
log.trace("Stream strategy failed ({}, Fast={}, Unicode={}): {}", charset, useFastPath, useUnicode, e.getMessage());

Copilot uses AI. Check for mistakes.
}

for (Charset charset : ENCODINGS_TO_TRY) {
if (charset.equals(encoding)) continue; // Skip failed cached encoding if we want, or retry it with different flags?
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline comment on skipping the cached encoding (“Skip failed cached encoding if we want, or retry it with different flags?”) reads like an unresolved design note. Please either remove it or replace it with a concrete TODO linked to an issue so it doesn’t linger in production code.

Suggested change
if (charset.equals(encoding)) continue; // Skip failed cached encoding if we want, or retry it with different flags?
if (charset.equals(encoding)) continue; // Skip cached encoding when trying alternative charsets

Copilot uses AI. Check for mistakes.
Comment on lines +258 to 260

throw new IOException("Unable to read ZIP archive with any supported encoding");
}
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final throw new IOException("Unable to read ZIP archive with any supported encoding") drops the underlying/root exception that caused all strategies to fail. This makes production failures hard to diagnose (especially since per-strategy logs are trace). Consider keeping the last thrown exception and attaching it as the cause of this IOException (or including its type/message).

Copilot uses AI. Check for mistakes.
if (encoding != null) {
if (tryStreamEntry(cbxPath, entryName, outputStream, encoding, true, useUnicode)) return;
if (tryStreamEntry(cbxPath, entryName, outputStream, encoding, false, useUnicode)) return;
}
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 290 has a visibly over-indented closing brace (}), which looks accidental and hurts readability (and can trip style checks if enforced). Please fix the indentation to match the surrounding block.

Suggested change
}
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Possible regression with opening CBZ files

1 participant