fix(cbz-reader): CBZ reading regression and optimize ZIP parsing strategy#2593
fix(cbz-reader): CBZ reading regression and optimize ZIP parsing strategy#2593balazs-szucs wants to merge 5 commits intobooklore-app:developfrom
Conversation
…nd fallback mechanisms Signed-off-by: Balázs Szücs <[email protected]>
There was a problem hiding this comment.
Pull request overview
Fixes a CBZ reading regression introduced by stricter Unicode Extra Field parsing in commons-compress by adding a fallback ZIP parsing mode (disabling Unicode Extra Fields), optimizing the ZIP strategy iteration order, and caching the successful ZIP settings for subsequent reads/streams.
Changes:
- Add ZIP fallback logic to retry without Unicode Extra Fields when standard parsing fails.
- Refactor ZIP strategy iteration to try (Fast/Slow × Unicode on/off) per encoding for better efficiency.
- Extend cached archive metadata to remember the successful
useUnicodeExtraFieldssetting; add a unit test covering the fallback path.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| booklore-api/src/main/java/org/booklore/service/reader/CbxReaderService.java | Adds Unicode Extra Fields fallback, strategy iteration refactor, and caches useUnicodeExtraFields for streaming. |
| booklore-api/src/test/java/org/booklore/service/reader/CbxReaderServiceTest.java | Adds a test validating the CBZ fallback behavior when Unicode Extra Fields parsing fails. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } catch (Exception e) { | ||
| log.trace("Stream strategy failed ({}, Fast={}, Unicode={}): {}", charset, useFastPath, useUnicode, e.getMessage()); |
There was a problem hiding this comment.
tryStreamEntry catches all Exception, including IOException from reading the ZIP entry or writing to outputStream (e.g., client disconnect). Swallowing those and continuing can lead to misleading errors (eventually “Unable to find entry…”) and potential repeated/partial writes. Consider only catching the ZIP parsing/charset-related exceptions you expect here, and let IOException propagate (or rethrow it).
| } catch (Exception e) { | |
| log.trace("Stream strategy failed ({}, Fast={}, Unicode={}): {}", charset, useFastPath, useUnicode, e.getMessage()); | |
| } catch (IOException e) { | |
| // I/O problems (e.g., client disconnect, read/write failures) should abort the operation | |
| throw e; | |
| } catch (Exception e) { | |
| // Non-I/O problems (e.g., charset/ZIP parsing issues) mean this strategy failed; try the next one | |
| log.trace("Stream strategy failed ({}, Fast={}, Unicode={}): {}", charset, useFastPath, useUnicode, e.getMessage()); |
| } | ||
|
|
||
| for (Charset charset : ENCODINGS_TO_TRY) { | ||
| if (charset.equals(encoding)) continue; // Skip failed cached encoding if we want, or retry it with different flags? |
There was a problem hiding this comment.
The inline comment on skipping the cached encoding (“Skip failed cached encoding if we want, or retry it with different flags?”) reads like an unresolved design note. Please either remove it or replace it with a concrete TODO linked to an issue so it doesn’t linger in production code.
| if (charset.equals(encoding)) continue; // Skip failed cached encoding if we want, or retry it with different flags? | |
| if (charset.equals(encoding)) continue; // Skip cached encoding when trying alternative charsets |
|
|
||
| throw new IOException("Unable to read ZIP archive with any supported encoding"); | ||
| } |
There was a problem hiding this comment.
The final throw new IOException("Unable to read ZIP archive with any supported encoding") drops the underlying/root exception that caused all strategies to fail. This makes production failures hard to diagnose (especially since per-strategy logs are trace). Consider keeping the last thrown exception and attaching it as the cause of this IOException (or including its type/message).
| if (encoding != null) { | ||
| if (tryStreamEntry(cbxPath, entryName, outputStream, encoding, true, useUnicode)) return; | ||
| if (tryStreamEntry(cbxPath, entryName, outputStream, encoding, false, useUnicode)) return; | ||
| } |
There was a problem hiding this comment.
Line 290 has a visibly over-indented closing brace (}), which looks accidental and hurts readability (and can trip style checks if enforced). Please fix the indentation to match the surrounding block.
| } | |
| } |
🚀 Pull Request
📝 Description
Closes: #2589
Fixed a regression in Booklore 1.18.5 where some CBZ files could not be opened due to strict parsing of Unicode Extra Fields in the updated commons-compress library.
🛠️ Changes Implemented
🧪 Testing Strategy
📸 Visual Changes (if applicable)
Please Read - This Checklist is Mandatory
Mandatory Requirements (please check ALL boxes):
developbranch (please resolve any merge conflicts)./gradlew testfor Spring Boot backend, andng testfor Angular frontend - NO EXCEPTIONS)Why This Matters:
Recent production incidents have been traced back to:
Backend changes without tests will not be accepted. By completing this checklist thoroughly, you're helping maintain the quality and stability of Booklore for all users.
Note to Reviewers: Please verify the checklist is complete before beginning your review. If items are unchecked, kindly ask the contributor to complete them first.
💬 Additional Context (optional)