Issue1659 forecast time index issue by using timerange instead of index created by step_hr #1665

ankitpatnala · 2026-01-21T10:26:44Z

Description

We have indexing issue if step_hr > len_hr. We have repetitive index.
Before the index was based on step_hr. Now, in this PR, it has been change to DTRange.

This change introduces how we get data from datareader

Issue Number

Closes #1659

Is this PR a draft? Mark it as draft.

Still need to test

Need to adapt the change to other datareader type ("FESOM", "Obs")
Failed while using num_input_steps : 2

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

…ute timerange

ankitpatnala · 2026-01-21T10:45:58Z

@clessig I just created a draft PR for forecast time index issue.
I made it a draft PR as I still need to address 2 things (mentioned in the comment)

clessig

Please have a look at my comment on _get(). Why should/could the index change there?

clessig · 2026-01-21T16:53:14Z

src/weathergen/datasets/data_reader_anemoi.py


    @override
-    def _get(self, idx: TIndex, channels_idx: list[int]) -> ReaderData:
+    def _get(self, t_range: DTRange, channels_idx: list[int]) -> ReaderData:


The _get() function is derived and defined in the base class--we cannot just change it here. It's signature is also something that has been thought about, so any change would have to be discussed first. self._get_dataset_idxs(idx) also takes a TIndex as argument and not a DTRange:

WeatherGenerator/src/weathergen/datasets/data_reader_base.py

Line 690 in 2abc7df

def _get_dataset_idxs(self, idx: TIndex) -> tuple[NDArray[np.int64], DTRange]:

Working directly on indices based on step_hr makes it difficult to obtain the right window.
We have to index them based on forecast_hr then and sampling with a frequency ($step\_hr</code> / <code>forecast\_hr$)

We specifically abstracted all the functionality to map from time windows to the indices for each (time-stepped) dataset here:

WeatherGenerator/src/weathergen/datasets/data_reader_base.py

Line 719 in 2abc7df

def get_dataset_indexes_timestep(

Currently I don't see a reason to change this.

The dtr is picked from index at this line.

WeatherGenerator/src/weathergen/datasets/data_reader_base.py

Line 750 in 2abc7df

dtr = tw_handler.window(idx)

with step_hr = 24:00:00, the consecutive indices are now 24h apart

WeatherGenerator/src/weathergen/datasets/data_reader_base.py

Lines 128 to 147 in 2abc7df

def window(self, idx: TIndex) -> DTRange:

"""

Temporal window corresponding to index

Parameters

----------

idx :

index of temporal window

Returns

-------

start and end of temporal window

"""

t_start_win = self.t_start + self.t_window_step * idx

t_end_win = t_start_win + self.t_window_len

return DTRange(t_start_win, t_end_win)

as len_hr is still 06:00:00 it looks for time range within next 6 hours. So some time_windows are not mapped to any indices in such case.

index 1 : day1 00:00:00 - 06:00:00 index 2: day2 00:00:00 - 06:00:00

Alternate can be we need to change TIndex from np.int to float type to look for right window

WeatherGenerator/src/weathergen/datasets/data_reader_base.py

Line 142 in 2abc7df

t_start_win = self.t_start + self.t_window_step * idx

The dtr is picked from index at this line.

WeatherGenerator/src/weathergen/datasets/data_reader_base.py

Line 750 in 2abc7df

dtr = tw_handler.window(idx)

with step_hr = 24:00:00, the consecutive indices are now 24h apart

WeatherGenerator/src/weathergen/datasets/data_reader_base.py

Lines 128 to 147 in 2abc7df

def window(self, idx: TIndex) -> DTRange:

"""

Temporal window corresponding to index

Parameters

----------

idx :

index of temporal window

Returns

-------

start and end of temporal window

"""

t_start_win = self.t_start + self.t_window_step * idx

t_end_win = t_start_win + self.t_window_len

return DTRange(t_start_win, t_end_win)

as len_hr is still 06:00:00 it looks for time range within next 6 hours. So some time_windows are not mapped to any indices in such case.

index 1 : day1 00:00:00 - 06:00:00 index 2: day2 00:00:00 - 06:00:00

Alternate can be we need to change TIndex from np.int to float type to look for right window

WeatherGenerator/src/weathergen/datasets/data_reader_base.py

Line 142 in 2abc7df

t_start_win = self.t_start + self.t_window_step * idx

If step_hrs=24h and len_hrs=6:00 then this is the correct behaviour:

index 1 : day1 00:00:00 - 06:00:00
index 2: day2 00:00:00 - 06:00:00

index1 + 1 forecast_step shouldn't it give us : day1 06:00:00 - 12:00:00 ?
there is no index mapping to this time_window then.

WeatherGenerator/src/weathergen/datasets/multi_stream_data_sampler.py

Lines 414 to 416 in 2abc7df

step_forecast_dt = idx + (self.forecast_delta_dt * fstep) // self.step_timedelta

time_win_target = self.time_window_handler.window(step_forecast_dt)

WeatherGenerator/src/weathergen/datasets/multi_stream_data_sampler.py

Lines 536 to 540 in 2abc7df

output_data = []

for fstep in range(self.forecast_offset, self.forecast_offset + forecast_dt + 1):

step_forecast_dt = base_idx + (self.forecast_delta_dt * fstep) // self.step_timedelta

rdata = collect_datasources(stream_ds, step_forecast_dt, "target")

Forecast data is also obtained by index.

WeatherGenerator/src/weathergen/datasets/multi_stream_data_sampler.py

Lines 44 to 65 in 2abc7df

def collect_datasources(stream_datasets: list, idx: int, type: str) -> IOReaderData:

"""

Utility function to collect all sources / targets from streams list

"""

rdatas = []

for ds in stream_datasets:

if type == "source":

get_reader_data = ds.get_source

normalize_channels = ds.normalize_source_channels

elif type == "target":

get_reader_data = ds.get_target

normalize_channels = ds.normalize_target_channels

else:

assert False, "invalid value for argument `type`"

# get source (of potentially multi-step length)

rdata = get_reader_data(idx).remove_nan_coords()

rdata.data = normalize_channels(rdata.data)

rdata.geoinfos = ds.normalize_geoinfos(rdata.geoinfos)

rdatas += [rdata]

the index from this is then called to ds.get_target which does not find forecast time_window

Patnala,Ankit added 3 commits January 21, 2026 09:19

Updated to adjust step_hr > len_hr by changing it from index to absol…

b9e048c

…ute timerange

checked out the develop version

506ed9d

removed commented and spurious lines

ff33c71

github-project-automation bot added this to WeatherGen-dev Jan 21, 2026

lint check and formatted

d3be4dd

ankitpatnala changed the title ~~Ankit issue1659 forecast time index issue by using timerange instead of index created by step_hr~~ Issue1659 forecast time index issue by using timerange instead of index created by step_hr Jan 21, 2026

clessig reviewed Jan 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue1659 forecast time index issue by using timerange instead of index created by step_hr #1665

Issue1659 forecast time index issue by using timerange instead of index created by step_hr #1665

ankitpatnala commented Jan 21, 2026 •

edited

Loading

Uh oh!

ankitpatnala commented Jan 21, 2026

Uh oh!

clessig left a comment

Uh oh!

clessig Jan 21, 2026

Uh oh!

ankitpatnala Jan 21, 2026 •

edited

Loading

Uh oh!

clessig Jan 22, 2026 •

edited

Loading

Uh oh!

ankitpatnala Jan 22, 2026 •

edited

Loading

Uh oh!

clessig Jan 22, 2026

Uh oh!

ankitpatnala Jan 22, 2026 •

edited

Loading

Uh oh!

ankitpatnala Jan 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	def window(self, idx: TIndex) -> DTRange:
	"""
	Temporal window corresponding to index

	Parameters
	----------
	idx :
	index of temporal window

	Returns
	-------
	start and end of temporal window
	"""

	t_start_win = self.t_start + self.t_window_step * idx
	t_end_win = t_start_win + self.t_window_len

	return DTRange(t_start_win, t_end_win)

	step_forecast_dt = idx + (self.forecast_delta_dt * fstep) // self.step_timedelta
	time_win_target = self.time_window_handler.window(step_forecast_dt)

	output_data = []
	for fstep in range(self.forecast_offset, self.forecast_offset + forecast_dt + 1):
	step_forecast_dt = base_idx + (self.forecast_delta_dt * fstep) // self.step_timedelta

	rdata = collect_datasources(stream_ds, step_forecast_dt, "target")

	def collect_datasources(stream_datasets: list, idx: int, type: str) -> IOReaderData:
	"""
	Utility function to collect all sources / targets from streams list
	"""

	rdatas = []

	for ds in stream_datasets:
	if type == "source":
	get_reader_data = ds.get_source
	normalize_channels = ds.normalize_source_channels
	elif type == "target":
	get_reader_data = ds.get_target
	normalize_channels = ds.normalize_target_channels
	else:
	assert False, "invalid value for argument `type`"

	# get source (of potentially multi-step length)
	rdata = get_reader_data(idx).remove_nan_coords()
	rdata.data = normalize_channels(rdata.data)
	rdata.geoinfos = ds.normalize_geoinfos(rdata.geoinfos)
	rdatas += [rdata]

Issue1659 forecast time index issue by using timerange instead of index created by step_hr #1665

Are you sure you want to change the base?

Issue1659 forecast time index issue by using timerange instead of index created by step_hr #1665

Conversation

ankitpatnala commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issue Number

Still need to test

Checklist before asking for review

Uh oh!

ankitpatnala commented Jan 21, 2026

Uh oh!

clessig left a comment

Choose a reason for hiding this comment

Uh oh!

clessig Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

ankitpatnala Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clessig Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ankitpatnala Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clessig Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

ankitpatnala Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ankitpatnala Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ankitpatnala commented Jan 21, 2026 •

edited

Loading

ankitpatnala Jan 21, 2026 •

edited

Loading

clessig Jan 22, 2026 •

edited

Loading

ankitpatnala Jan 22, 2026 •

edited

Loading

ankitpatnala Jan 22, 2026 •

edited

Loading

ankitpatnala Jan 22, 2026 •

edited

Loading