Internal GeoTiff block layout and read patterns. Change inbound #376
davidfrantz
started this conversation in
General
Replies: 1 comment
-
|
This is now live in dev! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
In preparation for some experiments I want to do later, I implemented functionality to process chunks more flexibly (this will be pushed in dev soon, but some decisions still need to be taken).
Some Background on the current method:
When using
GeoTiff, files are currently stored in a striped layout with a user-defined height and a width that equals the tile size. Example: when using 30km tiles and 3km blocks, 10 blocks are created, i.e., 1 column, 10 rows (X1-Y10).When processing those files (higher level), it is generally recommended to use a reading pattern that exactly corresponds to those blocks - or multiple blocks while staying on the block boundaries. But splitting a block is unwise.
Currently, processing, and hence reading patterns do follow this striped layout, but the height of the processed block can be configured to handle situations where RAM becomes an issue. With the new and more flexible implementation, however, a tiled reading pattern becomes possible, too. And this has severe performance ramifications if used improperly.
As an example, see this table where I measure reading speeds with different reading patterns (numbers are seconds). The data are one year of Sentinel-2 and Landsat data for one year and one 30km tile. The phyisical file layout is X1-Y10. So obviously, running the processing on an X1-Y10 reading pattern (or multiples of it) is most efficient. On the contrary, splitting up blocks is hugely disadvantageous (as soon as you use more than one block in x-direction).
Implementing tiled file layouts
So, the obvious solution to better support tiled reading patterns, would be to use tiled file layouts. Surprisingly though, this is not possible with the
GeoTiffdriver. At least not with the constraint that blocks should fit perfectly within a tile (and I am currently not planning on changing that constraint). When tiling is used, it is strictly necessary that block sizes are a multiple of 16. So, I played around with this and used a 256 x 256 block size (regardless whether it is Sentinel-2 or Landsat; this means Sentinel-2 has more blocks than Landsat).This here is the same test as before:
This basically yields similar reading speeds as the block-perfect reading pattern above, i.e., if a similar number of blocks is used. However, it is also apparent that the penalty for not sticking to block boundaries is way less pronounced. It may make it a bit harder to configure (because there is no "correct" combination anymore), but my feeling is that this file layout is much better in general.
The way forward
I am leaning towards making this the new default
GTiffformat in FORCE. I will also switch toZSTDcompression instead ofLZW. If the old layout still needs to be used, e.g. for data cube continuity reasons, it can be used by providing the explicit driver configuration through the custom GDAL output parameter. I will share a working template for this.This new layout is very similar to the
COGformat (no overviews though). A big advantage is that it can be updated, i.e., it works with the block-based processing strategy of FORCE.Let me know if you have concerns.
Beta Was this translation helpful? Give feedback.
All reactions