-
Notifications
You must be signed in to change notification settings - Fork 51
Issue1659 forecast time index issue by using timerange instead of index created by step_hr #1665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Issue1659 forecast time index issue by using timerange instead of index created by step_hr #1665
Conversation
|
@clessig I just created a draft PR for forecast time index issue. |
clessig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have a look at my comment on _get(). Why should/could the index change there?
|
|
||
| @override | ||
| def _get(self, idx: TIndex, channels_idx: list[int]) -> ReaderData: | ||
| def _get(self, t_range: DTRange, channels_idx: list[int]) -> ReaderData: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _get() function is derived and defined in the base class--we cannot just change it here. It's signature is also something that has been thought about, so any change would have to be discussed first. self._get_dataset_idxs(idx) also takes a TIndex as argument and not a DTRange:
| def _get_dataset_idxs(self, idx: TIndex) -> tuple[NDArray[np.int64], DTRange]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working directly on indices based on step_hr makes it difficult to obtain the right window.
We have to index them based on forecast_hr then and sampling with a frequency (
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We specifically abstracted all the functionality to map from time windows to the indices for each (time-stepped) dataset here:
| def get_dataset_indexes_timestep( |
Currently I don't see a reason to change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dtr is picked from index at this line.
| dtr = tw_handler.window(idx) |
with step_hr = 24:00:00, the consecutive indices are now 24h apart
WeatherGenerator/src/weathergen/datasets/data_reader_base.py
Lines 128 to 147 in 2abc7df
| def window(self, idx: TIndex) -> DTRange: | |
| """ | |
| Temporal window corresponding to index | |
| Parameters | |
| ---------- | |
| idx : | |
| index of temporal window | |
| Returns | |
| ------- | |
| start and end of temporal window | |
| """ | |
| t_start_win = self.t_start + self.t_window_step * idx | |
| t_end_win = t_start_win + self.t_window_len | |
| return DTRange(t_start_win, t_end_win) | |
as len_hr is still 06:00:00 it looks for time range within next 6 hours. So some time_windows are not mapped to any indices in such case.
index 1 : day1 00:00:00 - 06:00:00
index 2: day2 00:00:00 - 06:00:00
Alternate can be we need to change TIndex from np.int to float type to look for right window
| t_start_win = self.t_start + self.t_window_step * idx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dtr is picked from index at this line.
dtr = tw_handler.window(idx) with
step_hr = 24:00:00, the consecutive indices are now 24h apartWeatherGenerator/src/weathergen/datasets/data_reader_base.py
Lines 128 to 147 in 2abc7df
def window(self, idx: TIndex) -> DTRange: """ Temporal window corresponding to index Parameters ---------- idx : index of temporal window Returns ------- start and end of temporal window """ t_start_win = self.t_start + self.t_window_step * idx t_end_win = t_start_win + self.t_window_len return DTRange(t_start_win, t_end_win) as
len_hris still06:00:00it looks for time range within next 6 hours. So sometime_windowsare not mapped to any indices in such case.index 1 : day1 00:00:00 - 06:00:00 index 2: day2 00:00:00 - 06:00:00Alternate can be we need to change
TIndexfromnp.inttofloattype to look for right window
t_start_win = self.t_start + self.t_window_step * idx
If step_hrs=24h and len_hrs=6:00 then this is the correct behaviour:
index 1 : day1 00:00:00 - 06:00:00
index 2: day2 00:00:00 - 06:00:00
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index1 + 1 forecast_step shouldn't it give us : day1 06:00:00 - 12:00:00 ?
there is no index mapping to this time_window then.
WeatherGenerator/src/weathergen/datasets/multi_stream_data_sampler.py
Lines 414 to 416 in 2abc7df
| step_forecast_dt = idx + (self.forecast_delta_dt * fstep) // self.step_timedelta | |
| time_win_target = self.time_window_handler.window(step_forecast_dt) | |
WeatherGenerator/src/weathergen/datasets/multi_stream_data_sampler.py
Lines 536 to 540 in 2abc7df
| output_data = [] | |
| for fstep in range(self.forecast_offset, self.forecast_offset + forecast_dt + 1): | |
| step_forecast_dt = base_idx + (self.forecast_delta_dt * fstep) // self.step_timedelta | |
| rdata = collect_datasources(stream_ds, step_forecast_dt, "target") |
Forecast data is also obtained by index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WeatherGenerator/src/weathergen/datasets/multi_stream_data_sampler.py
Lines 44 to 65 in 2abc7df
| def collect_datasources(stream_datasets: list, idx: int, type: str) -> IOReaderData: | |
| """ | |
| Utility function to collect all sources / targets from streams list | |
| """ | |
| rdatas = [] | |
| for ds in stream_datasets: | |
| if type == "source": | |
| get_reader_data = ds.get_source | |
| normalize_channels = ds.normalize_source_channels | |
| elif type == "target": | |
| get_reader_data = ds.get_target | |
| normalize_channels = ds.normalize_target_channels | |
| else: | |
| assert False, "invalid value for argument `type`" | |
| # get source (of potentially multi-step length) | |
| rdata = get_reader_data(idx).remove_nan_coords() | |
| rdata.data = normalize_channels(rdata.data) | |
| rdata.geoinfos = ds.normalize_geoinfos(rdata.geoinfos) | |
| rdatas += [rdata] |
the index from this is then called to ds.get_target which does not find forecast time_window
Description
We have indexing issue if
step_hr>len_hr. We have repetitive index.Before the index was based on
step_hr. Now, in this PR, it has been change toDTRange.This change introduces how we get data from
datareaderIssue Number
Closes #1659
Is this PR a draft? Mark it as draft.
Still need to test
Need to adapt the change to other datareader type ("FESOM", "Obs")
Failed while using
num_input_steps : 2Checklist before asking for review
./scripts/actions.sh lint./scripts/actions.sh unit-test./scripts/actions.sh integration-testlaunch-slurm.py --time 60