SYNOP datareader functionality for German stations #1660

TillHae · 2026-01-20T14:49:45Z

Description

This PR changes the SYNOP datareader to also work on german SYNOP station data instead of just the one from MetNorway. The following bullet points are an explanation of all my changes.

added height dimension, which affected the base class as well
German data does not need to handle missing values as they are already included as np.nan (e.g. ll. 164-166)
German data is structured after time and station_id. Geoinfo has to be flattened to 1-D before usage (ll.105-113)
German data dimensions needed to be transformed as they are (variables, spatial, time) instead of (variables, time, spatial) (ll. 231-238)
I've added a file for testing, which will be removed after the review (The test only works on juwels, as the underlying data is only available there)

Issue Number

This PR closes no issue, but is related to #862

Is this PR a draft? Mark it as draft.

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

TillHae · 2026-01-20T14:55:10Z

@wael-mika

TillHae · 2026-01-22T08:23:48Z

@clessig should I also complete the open TODOs that are mentioned through comments as part of this PR?

clessig · 2026-01-22T08:32:13Z

@clessig should I also complete the open TODOs that are mentioned through comments as part of this PR?

Yes, this would be great! Let me know if you have questions about this.

TillHae · 2026-01-22T09:08:53Z

@clessig should I also complete the open TODOs that are mentioned through comments as part of this PR?

Yes, this would be great! Let me know if you have questions about this.

Yes, I don't know what exactly is meant with the caching of the mean and standard deviation. It looks like they are already stored in the object. Do you also want them to be stored in the dataset file for the next time the dataset is read?

clessig · 2026-01-22T16:28:44Z

@clessig should I also complete the open TODOs that are mentioned through comments as part of this PR?

Yes, this would be great! Let me know if you have questions about this.

Yes, I don't know what exactly is meant with the caching of the mean and standard deviation. It looks like they are already stored in the object. Do you also want them to be stored in the dataset file for the next time the dataset is read?

They are computed on the fly if they are not present in the data file. That's expensive and not an option if the dataset if bigger. We should also not modify the existing data file; it should be considered immutable. We could generate an auxiliary file with the data, but then we need to decide where to store it.

TillHae · 2026-01-23T16:17:06Z

@clessig should I also complete the open TODOs that are mentioned through comments as part of this PR?

Yes, this would be great! Let me know if you have questions about this.

Yes, I don't know what exactly is meant with the caching of the mean and standard deviation. It looks like they are already stored in the object. Do you also want them to be stored in the dataset file for the next time the dataset is read?

They are computed on the fly if they are not present in the data file. That's expensive and not an option if the dataset if bigger. We should also not modify the existing data file; it should be considered immutable. We could generate an auxiliary file with the data, but then we need to decide where to store it.

It looks like they are always computed on the fly even when they are present in the data file. I think the best way to handle it is to have just one file where we store a hashmap like structure with the datasets as the key and their corresponding properties stored as values. This is the simplest solution to manage and enables us to seamlessly append datasets and properties to the file.

clessig · 2026-01-25T12:36:21Z

@clessig should I also complete the open TODOs that are mentioned through comments as part of this PR?

Yes, this would be great! Let me know if you have questions about this.

Yes, I don't know what exactly is meant with the caching of the mean and standard deviation. It looks like they are already stored in the object. Do you also want them to be stored in the dataset file for the next time the dataset is read?

They are computed on the fly if they are not present in the data file. That's expensive and not an option if the dataset if bigger. We should also not modify the existing data file; it should be considered immutable. We could generate an auxiliary file with the data, but then we need to decide where to store it.

It looks like they are always computed on the fly even when they are present in the data file. I think the best way to handle it is to have just one file where we store a hashmap like structure with the datasets as the key and their corresponding properties stored as values. This is the simplest solution to manage and enables us to seamlessly append datasets and properties to the file.

Having one file is appealing at first sight but it comes with a lot of question marks: most datasets already contain the information and we shouldn't duplicate it since we otherwise run the risk that it becomes inconsistent. Having one file for all other datasets would mean we need to copy it to all other HPCs whenever something for one dataset changes. This also seems sub-optimal.

My suggestion is to have a stream config argument that specifies which field in the data file contains mean and stdev. If it's not specified or does not exist, then we fall back to computing it.

basic functionality to read in german SYNOP station netCDF files

33af4a6

github-project-automation bot added this to WeatherGen-dev Jan 20, 2026

ruffed

e88a092

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SYNOP datareader functionality for German stations #1660

SYNOP datareader functionality for German stations #1660

Uh oh!

TillHae commented Jan 20, 2026 •

edited

Loading

Uh oh!

TillHae commented Jan 20, 2026

Uh oh!

TillHae commented Jan 22, 2026

Uh oh!

clessig commented Jan 22, 2026

Uh oh!

TillHae commented Jan 22, 2026

Uh oh!

clessig commented Jan 22, 2026

Uh oh!

TillHae commented Jan 23, 2026

Uh oh!

clessig commented Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SYNOP datareader functionality for German stations #1660

Are you sure you want to change the base?

SYNOP datareader functionality for German stations #1660

Uh oh!

Conversation

TillHae commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issue Number

Checklist before asking for review

Uh oh!

TillHae commented Jan 20, 2026

Uh oh!

TillHae commented Jan 22, 2026

Uh oh!

clessig commented Jan 22, 2026

Uh oh!

TillHae commented Jan 22, 2026

Uh oh!

clessig commented Jan 22, 2026

Uh oh!

TillHae commented Jan 23, 2026

Uh oh!

clessig commented Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TillHae commented Jan 20, 2026 •

edited

Loading