-
Notifications
You must be signed in to change notification settings - Fork 51
SYNOP datareader functionality for German stations #1660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: clessig/develop/synop_datareader
Are you sure you want to change the base?
SYNOP datareader functionality for German stations #1660
Conversation
|
@clessig should I also complete the open TODOs that are mentioned through comments as part of this PR? |
Yes, this would be great! Let me know if you have questions about this. |
Yes, I don't know what exactly is meant with the caching of the mean and standard deviation. It looks like they are already stored in the object. Do you also want them to be stored in the dataset file for the next time the dataset is read? |
They are computed on the fly if they are not present in the data file. That's expensive and not an option if the dataset if bigger. We should also not modify the existing data file; it should be considered immutable. We could generate an auxiliary file with the data, but then we need to decide where to store it. |
It looks like they are always computed on the fly even when they are present in the data file. I think the best way to handle it is to have just one file where we store a hashmap like structure with the datasets as the key and their corresponding properties stored as values. This is the simplest solution to manage and enables us to seamlessly append datasets and properties to the file. |
Having one file is appealing at first sight but it comes with a lot of question marks: most datasets already contain the information and we shouldn't duplicate it since we otherwise run the risk that it becomes inconsistent. Having one file for all other datasets would mean we need to copy it to all other HPCs whenever something for one dataset changes. This also seems sub-optimal. My suggestion is to have a stream config argument that specifies which field in the data file contains mean and stdev. If it's not specified or does not exist, then we fall back to computing it. |
Description
This PR changes the SYNOP datareader to also work on german SYNOP station data instead of just the one from MetNorway. The following bullet points are an explanation of all my changes.
Issue Number
This PR closes no issue, but is related to #862
Is this PR a draft? Mark it as draft.
Checklist before asking for review
./scripts/actions.sh lint./scripts/actions.sh unit-test./scripts/actions.sh integration-testlaunch-slurm.py --time 60