FlagCX is part of FlagOS, a fully open-source system software stack designed to unify the model–system–chip layers and foster an open and collaborative ecosystem. It enables a "develop once, run anywhere" workflow across diverse AI accelerators, unlocking hardware performance, eliminating fragmentation among AI chipset-specific software stacks, and substantially lowering the cost of porting and maintaining AI workloads.
FlagCX is a scalable and adaptive cross-chip communication library. It serves as a platform where developers, researchers, and AI engineers can collaborate on various projects, contribute to the development of cutting-edge AI solutions, and share their work with the global community.
FlagCX leverages native collective communication libraries to provide full single-chip communication support across platforms. Beyond its native x-CCL integrations, FlagCX introduces original device-buffer IPC and device-buffer RDMA technologies, enabling high-performance P2P operations for both cross-chip and single-chip scenarios. These mechanisms can be seamlessly combined with native x-CCL backends to deliver optimized performance for cross-chip collective communications.
The following table summarizes the currently supported communication backends and their corresponding capabilities.
| Backend | NCCL | IXCCL | CNCL | MCCL | XCCL | DUCCL | HCCL | MUSACCL | RCCL | TCCL | ECCL |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mode | Homo/Hetero | Homo/Hetero | Homo/Hetero | Homo/Hetero | Homo/Hetero | Homo/Hetero | Homo/Hetero | Homo/Hetero | Homo/Hetero | Homo/Hetero | Homo/Hetero |
| send | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/☓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ |
| recv | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/☓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ |
| broadcast | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/☓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/☓ |
| gather | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ☓/☓ | ✓/✓ | ✓/☓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/☓ |
| scatter | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/☓ | ✓/✓ | ✓/☓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/☓ |
| reduce | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/☓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/☓ |
| allreduce | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/☓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ |
| allgather | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/☓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/☓ |
| reducescatter | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/☓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/☓ |
| alltoall | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/☓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ |
| alltoallv | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/☓ | ✓/✓ | ✓/☓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ |
| group ops | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/☓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/☓ |
Note that Homo and Hetero modes refer to communications among homogeneous and heterogeneous clusters. All native collective communications libraries can be referenced through the links below (in alphabetic order):
- CNCL, Cambricon Communications Library.
- DUCCL, DU Collective Communications Library.
- ECCL, Enflame Collective Communications Library.
- HCCL, Ascend Communications Library.
- IXCCL, Iluvatar Corex Collective Communications Library.
- MCCL, Metax Collective Communications Library.
- MUSACCL, Musa Collective Communications Library.
- NCCL, NVIDIA Collective Communications Library.
- RCCL, ROCm Communication Collectives Library.
- TCCL, TsingMicro Communication Collectives Library.
- XCCL, Kunlunxin XPU Collective Communications Library.
Additionally, FlagCX supports three collective communication libraries for host-side communication:
- BOOTSTRAP: Host-side communication library built using the FlagCX
bootstrapcomponent. - GLOO: Gloo Collective Communications Library.
- MPI: Message Passing Interface (MPI) standard.
FlagCX integrates with upper-layer applications such as PyTorch and
PaddlePaddle.
The table below lists the frameworks supported by FlagCX and their related communication operations,
where the batch_XXX and XXX_coalesced ops refer to the usage of group primitives.
| Framework | PyTorch | PaddlePaddle |
|---|---|---|
| send | ✓ | ✓ |
| recv | ✓ | ✓ |
| all_gather | ✓ | ✓ |
| all_gather_into_tensor_coalesced | ✓ (in order, no aggregation) | ☓ |
| all_reduce | ✓ | ✓ |
| all_reduce_coalesced | ✓ (in order, no aggregation) | ☓ |
| all_to_all | ✓ | ✓ |
| all_to_all_single | ✓ | ✓ |
| barrier | ✓ | ✓ |
| batch_isend_irecv | ✓ | ✓ |
| broadcast | ✓ | ✓ |
| gather | ✓ | ✓ |
| reduce | ✓ | ✓ |
| reduce_scatter | ✓ | ✓ |
| reduce_scatter_tensor_coalesced | ✓ (in order, no aggregation) | ☓ |
| scatter | ✓ | ✓ |
Note that PyTorch support is enabled via the FlagCX Torch plugin, which provides native integration with the PyTorch distributed backend. This plugin has undergone comprehensive validation across diverse communication backends and hardware platforms, ensuring robust functionality, consistent performance, and compatibility in multi-chip heterogeneous environments.
| FlagCX Backend | NCCL | IXCCL | CNCL | MCCL | XCCL | DUCCL | HCCL | MUSACCL | RCCL | TCCL | ECCL |
|---|---|---|---|---|---|---|---|---|---|---|---|
| PyTorch Support | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Tip
To enable heterogeneous cross-chip communication using the PyTorch DDP FlagCX backend, it is recommended to use identical PyTorch versions across all nodes. Mismatched versions may lead to initialization failures during process group setup. Helpful advice for doing things better or more easily.
Please check the guides on building, testing the software:
After building and testing FlagCX, you can start training models using upper-layer deep learning frameworks such as PyTorch or PaddlePaddle using FlagCX as the communication backend. We provide detailed user guides for both homogeneous and heterogeneous training across different hardware platforms. Please refer to the docs below:
-
We warmly welcome community contributions to help expand and strengthen the validation matrix.
-
Join our Discussion Channel
This project is licensed under the Apache License (Version 2.0).
