Skip to content

FlagCX is a scalable and adaptive cross-chip communication library.

License

Notifications You must be signed in to change notification settings

flagos-ai/FlagCX

Repository files navigation

github+banner-20260130

About

FlagCX is part of FlagOS, a fully open-source system software stack designed to unify the model–system–chip layers and foster an open and collaborative ecosystem. It enables a "develop once, run anywhere" workflow across diverse AI accelerators, unlocking hardware performance, eliminating fragmentation among AI chipset-specific software stacks, and substantially lowering the cost of porting and maintaining AI workloads.

FlagCX is a scalable and adaptive cross-chip communication library. It serves as a platform where developers, researchers, and AI engineers can collaborate on various projects, contribute to the development of cutting-edge AI solutions, and share their work with the global community.

FlagCX leverages native collective communication libraries to provide full single-chip communication support across platforms. Beyond its native x-CCL integrations, FlagCX introduces original device-buffer IPC and device-buffer RDMA technologies, enabling high-performance P2P operations for both cross-chip and single-chip scenarios. These mechanisms can be seamlessly combined with native x-CCL backends to deliver optimized performance for cross-chip collective communications.

Backend Support

The following table summarizes the currently supported communication backends and their corresponding capabilities.

Backend NCCL IXCCL CNCL MCCL XCCL DUCCL HCCL MUSACCL RCCL TCCL ECCL
Mode Homo/Hetero Homo/Hetero Homo/Hetero Homo/Hetero Homo/Hetero Homo/Hetero Homo/Hetero Homo/Hetero Homo/Hetero Homo/Hetero Homo/Hetero
send ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓ ✓/✓
recv ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓ ✓/✓
broadcast ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓ ✓/☓
gather ✓/✓ ✓/✓ ✓/✓ ✓/✓ ☓/☓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓ ✓/☓
scatter ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓ ✓/☓
reduce ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓ ✓/☓
allreduce ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓ ✓/✓
allgather ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓ ✓/☓
reducescatter ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓ ✓/☓
alltoall ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓ ✓/✓
alltoallv ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓ ✓/✓
group ops ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/✓ ✓/☓ ✓/✓ ✓/✓ ✓/✓ ✓/☓

Note that Homo and Hetero modes refer to communications among homogeneous and heterogeneous clusters. All native collective communications libraries can be referenced through the links below (in alphabetic order):

  • CNCL, Cambricon Communications Library.
  • DUCCL, DU Collective Communications Library.
  • ECCL, Enflame Collective Communications Library.
  • HCCL, Ascend Communications Library.
  • IXCCL, Iluvatar Corex Collective Communications Library.
  • MCCL, Metax Collective Communications Library.
  • MUSACCL, Musa Collective Communications Library.
  • NCCL, NVIDIA Collective Communications Library.
  • RCCL, ROCm Communication Collectives Library.
  • TCCL, TsingMicro Communication Collectives Library.
  • XCCL, Kunlunxin XPU Collective Communications Library.

Additionally, FlagCX supports three collective communication libraries for host-side communication:

  • BOOTSTRAP: Host-side communication library built using the FlagCX bootstrap component.
  • GLOO: Gloo Collective Communications Library.
  • MPI: Message Passing Interface (MPI) standard.

Application Integration

FlagCX integrates with upper-layer applications such as PyTorch and PaddlePaddle. The table below lists the frameworks supported by FlagCX and their related communication operations, where the batch_XXX and XXX_coalesced ops refer to the usage of group primitives.

Framework PyTorch PaddlePaddle
send
recv
all_gather
all_gather_into_tensor_coalesced ✓ (in order, no aggregation)
all_reduce
all_reduce_coalesced ✓ (in order, no aggregation)
all_to_all
all_to_all_single
barrier
batch_isend_irecv
broadcast
gather
reduce
reduce_scatter
reduce_scatter_tensor_coalesced ✓ (in order, no aggregation)
scatter

Note that PyTorch support is enabled via the FlagCX Torch plugin, which provides native integration with the PyTorch distributed backend. This plugin has undergone comprehensive validation across diverse communication backends and hardware platforms, ensuring robust functionality, consistent performance, and compatibility in multi-chip heterogeneous environments.

FlagCX Backend NCCL IXCCL CNCL MCCL XCCL DUCCL HCCL MUSACCL RCCL TCCL ECCL
PyTorch Support

Tip

To enable heterogeneous cross-chip communication using the PyTorch DDP FlagCX backend, it is recommended to use identical PyTorch versions across all nodes. Mismatched versions may lead to initialization failures during process group setup. Helpful advice for doing things better or more easily.

Quick Start

Please check the guides on building, testing the software:

Training Models

After building and testing FlagCX, you can start training models using upper-layer deep learning frameworks such as PyTorch or PaddlePaddle using FlagCX as the communication backend. We provide detailed user guides for both homogeneous and heterogeneous training across different hardware platforms. Please refer to the docs below:

Contribution

  • We warmly welcome community contributions to help expand and strengthen the validation matrix.

  • Join our Discussion Channel

    开源小助手

License

This project is licensed under the Apache License (Version 2.0).