Allow seeking back to read skipped chunks#657
Allow seeking back to read skipped chunks#657fintelia wants to merge 1 commit intoimage-rs:masterfrom
Conversation
197g
left a comment
There was a problem hiding this comment.
I like it in principle, seeking is a capability that will be highly in the streaming decoder. Since all the seeks are relation we should make sure that intermittent interrupts work correctly with the position tracking. I don't forsee many problems with this but a few tests would still be added.
| if self.exif_position.is_none() { | ||
| self.exif_position = Some(chunk_start); | ||
| } |
There was a problem hiding this comment.
The error treatment here differs from parsing the eXIf and iCCP chunk. In the latter case we error first with a Format and then later, probably, downgrade that to BadAncillaryChunk in most cases. Here however we do nothing.
There was a problem hiding this comment.
Also do we consider the option changing through set_ignore_exif_chunk while it is read, i.e. first reading one fully and then only recording a position would still be a duplicate chunk. That may not happen but without the interface really assuring it we should know if that is in-scope of the implementation or not.
| let chunk_start = stream_position - 8 - self.current_chunk.raw_bytes.len() as u64; | ||
|
|
||
| match self.current_chunk.type_ { | ||
| chunk::tEXt | chunk::zTXt | chunk::iTXt => { |
There was a problem hiding this comment.
Wouldn't we still want to know the exact kind of text chunk? zTXt is handled very different from the other chunks. We can of course parse it back when we seek back and re-read the chunk anew but it's fixed-size metadata—so we might as well.
There was a problem hiding this comment.
Yeah, I wasn't sure whether to save the chunk size/type or just re-read it when requested. Easy enough to track it alongside the chunk start position.
With #630 in place, we now have most of the pieces to load PNG metadata without consuming an unbounded amount of space.
The key here is that we track just the start position of each unbounded-size chunk we might care about. Which means that we can track thousands of text chunks with even a tiny space limit. When it comes time to read the chunks, we can seek back to them and provide the desired info. But if the chunk info is never requested then we don't pay the overhead of storing them. When in recording-mode, I believe that the only non-constant memory usage should be recording text chunk positions and reserving two rows of space for the unfiltering buffer.
This is particularly useful for the
imagecrate which needs to know whether there's a tRNS chunk (to determine whether the color type is RGB or RGBA) before it knows whether the EXIF or ICC profile will be requested. By doing things this way, it neatly sidesteps the question of setting decoding limits before reading metadata. The main thing decoding limits were being used for was prevent zip-bombs in the compressed iCCP chunks and to a lesser degree preventing unreasonably sized EXIF or text chunks.