Reland support premul sum for xccl by Chao1Han · Pull Request #3173 · intel/torch-xpu-ops

Chao1Han · 2026-03-25T02:25:59Z

Reland #1948

disable_e2e
disable_ut

This reverts commit 770ea42.

Copilot

Pull request overview

This PR reintroduces XCCL support for ReduceOp::PREMUL_SUM when building against oneCCL >= 2021.17 by adding version gating, reduction-op construction helpers, and RAII cleanup for custom reduction handles.

Changes:

Add compile-time ENABLE_XCCL_PREMUL_SUM_SUPPORT based on oneCCL version macros.
Introduce RAII wrappers and unpack helpers to create/destroy PREMUL_SUM reduction ops for both CCL “V1” and oneCCL “V2” APIs.
Update XCCL collective call sites to pass datatype/communicator needed to build PREMUL_SUM ops.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
src/xccl/xccl.h	Adds PREMUL_SUM enablement, RAII reduction wrapper, and PREMUL_SUM mapping logic in `getXcclReduceOpV1/V2`.
src/xccl/xccl.cpp	Updates allreduce/reduce/reduce_scatter wrappers to pass datatype + communicator into the reduce-op selection helpers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/xccl/xccl.h

github-actions · 2026-03-25T06:38:32Z

Performance outliers, please check!

🔴 [-1, 80%), should be regression

Category	Model	Target vs. Baseline [Eager]	Target vs. Baseline [Inductor]
torchbench_bfloat16_training	Background_Matting	0.744062	0.682433
torchbench_bfloat16_training	pytorch_unet	0.765713	0.725342
torchbench_bfloat16_training	alexnet	0.772230	0.754138
torchbench_bfloat16_training	resnet50	0.755595	0.764938
torchbench_bfloat16_training	nvidia_deeprecommender	0.750006	0.770794
torchbench_bfloat16_training	shufflenet_v2_x1_0	0.783204	0.858808
torchbench_bfloat16_training	LearningToPaint	0.686767	0.862129
torchbench_bfloat16_training	vgg16	0.752418	0.864969

🟡 [80%, 90%), may be fluctuations

Category	Model	Target vs. Baseline [Eager]	Target vs. Baseline [Inductor]
torchbench_bfloat16_training	mobilenet_v2	0.843263	0.828889
torchbench_bfloat16_training	resnet18	0.850908	0.839607
torchbench_bfloat16_training	BERT_pytorch	1.011453	0.895164
torchbench_bfloat16_training	mnasnet1_0	0.879395	0.946367
torchbench_bfloat16_training	squeezenet1_1	0.821079	0.946886
torchbench_bfloat16_training	resnext50_32x4d	0.895986	1.002764

Chao1Han added 2 commits March 25, 2026 10:23

Reapply "Support premul_sum for xccl (#1947)" (#2913)

d128f3f

This reverts commit 770ea42.

change api

c508c6a

Copilot AI review requested due to automatic review settings March 25, 2026 02:26

Copilot started reviewing on behalf of Chao1Han March 25, 2026 02:27 View session

change version

a9bfefc

Copilot AI reviewed Mar 25, 2026

View reviewed changes

src/xccl/xccl.h Show resolved Hide resolved

src/xccl/xccl.h Show resolved Hide resolved

src/xccl/xccl.h Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reland support premul sum for xccl#3173

Reland support premul sum for xccl#3173
Chao1Han wants to merge 3 commits intomainfrom
xccl/reland

Chao1Han commented Mar 25, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Chao1Han commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 25, 2026

Performance outliers, please check!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Chao1Han commented Mar 25, 2026 •

edited

Loading