Skip to content

feat(graphs): Introduce RadiusAreaMaskBuilder#1004

Open
mpvginde wants to merge 6 commits intomainfrom
feat/refactor_knnareamaskbuilder
Open

feat(graphs): Introduce RadiusAreaMaskBuilder#1004
mpvginde wants to merge 6 commits intomainfrom
feat/refactor_knnareamaskbuilder

Conversation

@mpvginde
Copy link
Copy Markdown
Contributor

@mpvginde mpvginde commented Mar 23, 2026

When building a Stretched Grid or LAM hidden mesh from a refined Icosahedron, the mesh structure is defined by using a KNNAreaMaskBuilder for calculating which hidden nodes fall under the high resolution part of the data mesh.

This is currently done using a nearest neighbour search with sklearn.neigbors.NearestNeighbors. When going to very high refinement levels (>10), this becomes a bottleneck in the graph building (~1h)

This PR adds a new MaskBuilder-class: RadiusAreaMaskBuilder that uses a radius search either with torch_geometric.nn.radius when torch-cluster is installed or with scipy.spatial.cKDTree when torch-cluster is not available.

This results in a 60% speedup. Some benchmarks:

Builder Precision Where Global Resolution LAM resolution MS-edge-resolution Time
KNN numpy.float64 CPU 7 10 10 4m 46s
DotProduct torch.float64 GPU 7 10 10 3m 31s
DotProduct torch.float32 GPU 7 10 10 2m 23s
pyg.nn.radius torch.float64 CPU 7 10 10 1m 24s
pyg.nn.radius torch.float64 GPU 7 10 10 1m 40s
KNN numpy.float64 CPU 8 12 12 ~ 1h
DotProduct torch.float32 GPU 8 12 12 ~36m
pyg.nn.radius torch.float64 CPU 8 12 12 14m 52s
pyg.nn.radius torch.float64 GPU 8 12 12 17m 23s
scipy.spatial.cKDTree numpy.float? CPU 8 12 12 15m 1s

Some figures confirming identical Stretched Grid Meshes (4,6) for the MEPS domain
image

As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/

By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.

@mpvginde
Copy link
Copy Markdown
Contributor Author

Still need to do some cleanup (some documentations and typing need to be fixed), but feel free to have a look at it @JPXKQX

@mpvginde mpvginde added the ATS Approval Not Needed No approval needed by ATS label Mar 23, 2026
@JPXKQX
Copy link
Copy Markdown
Member

JPXKQX commented Mar 25, 2026

Hi Michiel, this looks really good. As I see it, this could replace the KNNAreaMaskBuilder completely, is there any reason to leave it? Great work!

reference_node_name: str,
margin_radius_km: float = 100,
mask_attr_name: str | None = None,
use_gpu: bool = False,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the results, shouldn't we use cpu always? It seems to be faster in both cases: with and without torch-cluster

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking maybe later we can exploit the fact that we have multiple GPU available and devide the work over the different GPUs for even more speed up. Not sure if what I say make sense. I don't know if torch-cluster supports parallelising this kind of functions/work?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, only "single-device" parallelism is supported. We need something similar to the model sharding in anemoi-models (we should have the code to split into shards and gather the shards later). I think this will be easier to explore once we move the graph inside the model.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even in that case, I would expect this to happen automatically, right?

@mpvginde
Copy link
Copy Markdown
Contributor Author

Hi Michiel, this looks really good. As I see it, this could replace the KNNAreaMaskBuilder completely, is there any reason to leave it? Great work!

Not really, I would also be in favour of removing it. Just left it in for now if people would want to benchmark it themselves.

@mpvginde
Copy link
Copy Markdown
Contributor Author

Probably best to use this PR to add some additional tests for LAM or SG hidden meshes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ATS Approval Not Needed No approval needed by ATS graphs

Projects

Status: To be triaged

Development

Successfully merging this pull request may close these issues.

2 participants