1654 Fixing forecast steps in model, loss, and data loader #1656

Jubeku · 2026-01-19T16:08:36Z

Description

This PR makes sure that:

In forecast mode we get len(pred) = len(forecast_steps) + forecast_offset.
batch.get_forecast_steps() corresponds to forecast.num_steps.
The loops over forecast steps in the model forward and in the loss calculator go from forecast_offset to forecast_offset + num_forecast_steps .
The tokens go first through the forecasting engine before going through the decoder.

Issue Number

Closes #1654

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

…ainer... and entire Rattenschwanz

Jubeku · 2026-01-22T11:06:25Z

src/weathergen/datasets/batch.py

-            forecast_dt = sdata.get_forecast_steps()
-        return forecast_dt
+            forecast_steps = sdata.get_forecast_steps()
+        return forecast_steps


I am still a bit concerned about the for loops here in which the variable is overwritten at each iteration. I assume both forecast_steps and output_len are constant within a batch across all streams?! If yes we can probably use it to handle it better.

MatKbauer

Looks good so far. I'd like to make one major suggestion: Can we rename forecast_steps to forecast_idxs (or forecast_idcs) to be very explicit in our naming? I would even go down to changing fstep to fidx then.

Most of my comments in the review relate to this suggested change in naming convention.

MatKbauer · 2026-01-22T10:31:19Z

src/weathergen/datasets/batch.py

-            forecast_dt = sdata.get_forecast_steps()
-        return forecast_dt
+            forecast_steps = sdata.get_forecast_steps()
+        return forecast_steps


Still don't quite understand why we have to loop over the streams here and overwrite output_len and forecast_steps, respectively. In multi-stream setup, output_len and forecast_steps will only be returned for the last stream. So why looping over streams instead of returning the last, if that is what we want?

In line with comments below, I'd advocate for get_num_forecast_steps() instead of get_forecast_steps()

MatKbauer · 2026-01-22T10:41:51Z

src/weathergen/datasets/multi_stream_data_sampler.py

+        dt = self._get_output_length(forecast_dt)
+        stream_data = StreamData(
+            base_idx, num_steps_input, dt, forecast_dt, self.forecast_offset, self.num_healpix_cells
+        )


Can we rename

num_steps_input to num_input_steps,

dt to num_forecast_steps, and

forecast_dt to forecast_steps?

MatKbauer · 2026-01-22T10:47:35Z

src/weathergen/datasets/stream_data.py

        Get number of forecast steps
        """
-        return self.forecast_steps
+        return self.forecast_idxs


Rename function from get_forecast_steps() to get_forecast_idxs and adapt function header (side note, I'd prefer to use idcs over idxs, but we have idxs at various places in the code)

MatKbauer · 2026-01-22T10:53:54Z

src/weathergen/datasets/stream_data.py


        self.input_steps = input_steps
-        self.forecast_steps = forecast_steps
+        self.output_steps = output_steps


Can we use self.num_output_steps and self.num_input_steps to be consistent with num_forecast_steps in the multi_stream_data_sampler?

MatKbauer · 2026-01-22T10:54:56Z

src/weathergen/datasets/stream_data.py

+        """
+        Get length of output
+        """
+        return self.output_steps


return self.num_output_steps

MatKbauer · 2026-01-22T11:09:48Z

src/weathergen/train/target_and_aux_module_base.py

+        self.latent = [{} for _ in range(len_target)]
+        self.aux_outputs = {}
+
+    def add_physical_target(self, fstep: int, stream_name: StreamName, pred: torch.Tensor) -> None:


MatKbauer · 2026-01-22T11:09:50Z

src/weathergen/train/target_and_aux_module_base.py

+    latent: list[dict[str, torch.Tensor | LatentState]]
    aux_outputs: dict[str, torch.Tensor]

+    def __init__(self, len_target: int, forecast_steps: list) -> None:


forecast_idxs

MatKbauer · 2026-01-22T11:11:29Z

src/weathergen/train/target_and_aux_module_base.py

        # collect all targets, concatenating across batch dimension since this is also how it
        # happens for predictions in the model
-        targets = {}
+        fstep_idxs = [0] if len(forecast_steps) == 0 else forecast_steps


How about using fidxs here and fidx in the forecast loop below?

I also considered something like target_idxs to make clear this not only holds for forecasting. But target_idx is also used later in the loss module to refer to the source-target correspondence so it invites risky confusions.

MatKbauer · 2026-01-22T11:12:17Z

src/weathergen/train/target_and_aux_module_base.py

+    def compute(self, bidx, batch, model_params, model) -> TargetAuxOutput:
        # TODO: properly retrieve/define these
        stream_names = [k for k, _ in batch.samples[0].streams_data.items()]
        forecast_steps = batch.get_forecast_steps()


forecast_idxs = batch.get_forecast_idxs()

MatKbauer · 2026-01-22T11:14:54Z

src/weathergen/utils/validation_io.py

-    for fstep in range(window_offset_prediction, forecast_steps):
+    # TODO why does this stop at forecast_steps? Maybe explains #1657
+    # for fstep in range(forecast_offset, forecast_steps + 1):
+    for fstep in range(forecast_offset, forecast_steps):


Crucial to keep in mind. Maybe have to extend this to +1 here or iterate over forecast_idxs

…imeste_idxs in targetaux

Jubeku · 2026-01-22T16:19:28Z

src/weathergen/datasets/multi_stream_data_sampler.py

        self.rng = np.random.default_rng(self.data_loader_rng_seed)

        fsm = (
-            self.forecast_steps[min(self.mini_epoch, len(self.forecast_steps) - 1)]


@clessig, why is len(self.forecast_steps) used here?

Jubeku · 2026-01-22T16:20:13Z

src/weathergen/datasets/multi_stream_data_sampler.py

        elif self.forecast_policy == "random" or self.forecast_policy == "sequential_random":
            # randint high=one-past
-            self.perms_forecast_dt = self.rng.integers(
-                low=self.forecast_steps.min(), high=fsm + 1, size=len_dt_samples, dtype=np.int64


Why is here high=fsm+1?

MatKbauer

Works well for me, when testing forecasting with offset: 0 or 1 and different num_steps. I launched a 2-node forecast pre-training (2 forecast steps) and would like to quickly verify 8 forecast steps fine-tuning subsequently.

first-pass fixing fsteps

815ae67

Jubeku self-assigned this Jan 19, 2026

github-project-automation bot added this to WeatherGen-dev Jan 19, 2026

Jubeku mentioned this pull request Jan 19, 2026

Inference is generating wrong number of forecast steps #1657

Open

Jubeku added 7 commits January 20, 2026 14:09

correct forecast loop condition

605b91c

update ModelOutput definition

a3860ec

rm breakpoint

075ccc1

mv offset to forecast cfg and make restructure targets

b2a1b4d

fix forecast_offset handling in data sampler

ed748a8

make sure ssl training is working without specifying train_cfg.forecast

bb33788

fix output lenght in dataloader, get rid of passing offset through tr…

42e583f

…ainer... and entire Rattenschwanz

Jubeku commented Jan 22, 2026

View reviewed changes

MatKbauer reviewed Jan 22, 2026

View reviewed changes

Jubeku added 4 commits January 22, 2026 13:17

correct validation io

cedce5b

loop over preds and targets in loss module and rename fstep_idxs to t…

119cf96

…imeste_idxs in targetaux

rename forecast_steps

af80310

correct loop in dataloader and lint

efe8ead

Jubeku commented Jan 22, 2026

View reviewed changes

Jubeku marked this pull request as ready for review January 22, 2026 16:39

Merge branch 'develop' into jk/develop/1654_fix_fsteps

1717a7e

MatKbauer approved these changes Jan 23, 2026

View reviewed changes

Jubeku added 2 commits January 23, 2026 15:02

update forecast policy and step validation

726bab5

add mode to forecast config validation output

d06c850

clessig mentioned this pull request Jan 25, 2026

Merged forecast step fixes and SSL fixes #1690

Draft

4 tasks

Fixing missing adjustement of function name

5f73cc2

1654 Fixing forecast steps in model, loss, and data loader #1656

Are you sure you want to change the base?

1654 Fixing forecast steps in model, loss, and data loader #1656

Uh oh!

Conversation

Jubeku commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issue Number

Checklist before asking for review

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MatKbauer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MatKbauer left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Jubeku commented Jan 19, 2026 •

edited

Loading