Skip to content

feat(training): decouple training method & task#896

Open
JPXKQX wants to merge 333 commits intomainfrom
feat/task-refactor
Open

feat(training): decouple training method & task#896
JPXKQX wants to merge 333 commits intomainfrom
feat/task-refactor

Conversation

@JPXKQX
Copy link
Copy Markdown
Member

@JPXKQX JPXKQX commented Feb 13, 2026

Description

This is a work in progress (#887 ).

Method \ Task Forecasting Autoencoding TemporalDownscaling
Single
Ensemble
Diffusion
TendencyDiffusion

Proposed (minor) changes:

  • Rollout configuration is moved to the task. It only needs to be specified for forecasting tasks. The validation_rollout is now specified too under the task definition. rollout.max is renamed to rollout.maximum (max is already a built-in operation in Python, so it's better to avoid reusing it for variable names or attributes).
  • model_task has been renamed to training_method.

Forecasting

anemoi-training train --config-name=debug
anemoi-training train --config-name=ensemble_crps
anemoi-training train --config-name=diffusion

Autoencoder

anemoi-training train --config-name=autoencoder training=single model=transformer
anemoi-training train --config-name=autoencoder training=ensemble model=transformer_ens +system.hardware.num_gpus_per_ensemble=1
anemoi-training train --config-name=autoencoder training=diffusion model=transformer_diffusion

Temporal downscaling

anemoi-training train --config-name=interpolator_multiout training=single model=gnn
anemoi-training train --config-name=interpolator_multiout training=ensemble model=graphtransformer_ens system.hardware.num_gpus_per_ensemble=1
anemoi-training train --config-name=interpolator_multiout training=diffusion model=graphtransformer_diffusion

TODOs

  • Make fcstep optional in ens model. This doesn't make sense for crps-autoencoder/downscaling/multiouttimeinterpolation.
  • Check time_indices passed to the reader.get_sample(). Cast to slice when possible to avoid tensor copies.=
  • Update plot adapters
  • Update docs

As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/

By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.


📚 Documentation preview 📚: https://anemoi-training--896.org.readthedocs.build/en/896/


📚 Documentation preview 📚: https://anemoi-graphs--896.org.readthedocs.build/en/896/


📚 Documentation preview 📚: https://anemoi-models--896.org.readthedocs.build/en/896/

@anaprietonem anaprietonem added ATS Approval Needed Approval needed by ATS and removed ATS Approval Not Needed No approval needed by ATS labels Feb 19, 2026
@JPXKQX JPXKQX marked this pull request as ready for review February 19, 2026 13:22
@anaprietonem anaprietonem added ATS Approved Approved by ATS and removed ATS Approval Needed Approval needed by ATS labels Feb 25, 2026
@github-actions github-actions bot added the bug Something isn't working label Feb 25, 2026
@anaprietonem
Copy link
Copy Markdown
Contributor

@ecmwf/anemoi_technical_subgroup - approved this PR. This refactor should come with clear docs and guidance for users on how to use them. It should also be clear what combinations of the above matrix are being tested and which are not.

in terms of breaking changes - if new config entries are added and those are breaking, there should be meaningful error message indicating or guidance on what to do

@mc4117 mc4117 added this to the Time-Interpolation milestone Mar 6, 2026
@mc4117 mc4117 removed bug Something isn't working labels Apr 10, 2026
@github-actions github-actions bot added the bug Something isn't working label Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ATS Approved Approved by ATS bug Something isn't working graphs models training

Projects

Status: Under Review

Development

Successfully merging this pull request may close these issues.

10 participants