Skip to content

refactor(models): graph partitioning#970

Draft
japols wants to merge 2 commits intorefactor/shard-shapesfrom
refactor/graph-partition
Draft

refactor(models): graph partitioning#970
japols wants to merge 2 commits intorefactor/shard-shapesfrom
refactor/graph-partition

Conversation

@japols
Copy link
Copy Markdown
Member

@japols japols commented Mar 12, 2026

Description

Refactor graph partitioning logic for sharding and chunking into a unified GraphPartition abstraction that exploits dst-sorted edges for fast subgraph operations via slicing.

Previously, edge sharding across GPUs and chunking within a GPU used separate code paths with expensive bipartite_subgraph calls. This PR consolidates both into a single GraphPartition dataclass that precomputes dst/edge splits and materialises subgraphs via simple slice operations.

This lays the groundwork for extending edge sharding to processors (non-bipartite graphs) and for more efficient halo exchange communication schemes.

What problem does this change solve?

  • faster subgraph operations via slicing instead of index_select
  • clearer graph sharding/chunking code in mapper.py
  • no sorting overhead when using Triton GT

What issue or task does this change relate to?

Additional notes

As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/

By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.

@japols japols self-assigned this Mar 12, 2026
@japols japols added enhancement New feature or request models ATS Approval Not Needed No approval needed by ATS labels Mar 12, 2026
@github-project-automation github-project-automation bot moved this to To be triaged in Anemoi-dev Mar 12, 2026
@japols japols requested review from cathalobrien and ssmmnn11 March 12, 2026 15:03
@japols
Copy link
Copy Markdown
Member Author

japols commented Mar 13, 2026

Benchmark tests pass with ~10-20% throughput improvements and ~10% reduced peak memory usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ATS Approval Not Needed No approval needed by ATS enhancement New feature or request models

Projects

Status: To be triaged

Development

Successfully merging this pull request may close these issues.

2 participants