critical impact of specifying ds_chunks

Hi everyone,

I noticed something that I think is worth mentioning as I don't see it in the doc, although I am not sure this should lead to a modification of the library.

It is very important to specify the `input_ds_chunks` parameter when calling `load_xorca_dataset`, because otherwise it can trigger an unreasonably large amount of tasks when reading the data -- lots of them are open_dataset and rechunking operations -- which critically increases the computation time. My call is that it is best to specify the actual chunks that are in the netCDF file (not sure how this turns for contiguous storage), but specifying the same as `target_ds_chunks` might be an option, depending on whether reading operation or rechunking/transferring data between workers dominates the computation time. 

I attach a [PDF of my notebook](https://github.com/willirath/xorca/files/8428559/test_xorca_reading.pdf) where I tested this on a small subdomain of a larger simulation, on my laptop. It shows that computing a mean that takes only 
1 s when specifying the `input_ds_chunks` goes up to 1 minute if `input_ds_chunks` is left blank.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

critical impact of specifying ds_chunks #45

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

critical impact of specifying ds_chunks #45

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions