[XPU] fix float32 matmul precision: use FC_FLOAT default instead of FC_TF32 on XRE5 by YqGe585 · Pull Request #78625 · PaddlePaddle/Paddle

YqGe585 · 2026-04-10T03:02:03Z

PR Category

Custom Device

PR Types

Bug fixes

Description

在 XPU XRE5 硬件上，FCCalcType<float>() 原来默认使用 FC_TF32（TensorFloat-32，仅 10 位尾数精度）。在矩阵 K 维度较大时（如 K=4096），TF32 的截断误差会累积，导致 paddle.matmul 的 XPU 结果与 GPU 结果之间的最大绝对误差超过阈值（实测 max_abs_diff=0.0139818 > atol=0.01）。

GPU 侧默认关闭 TF32（FLAGS_cublas_allow_tf32=false），使用完整 float32 精度。本 PR 将 XRE5 上的默认计算类型从 FC_TF32 改为 FC_FLOAT，使 XPU 与 GPU 精度行为保持一致。

需要 TF32 性能的用户仍可通过设置环境变量 XPU_PADDLE_FC_TF32 来启用。

修复前： max_abs_diff=0.0139818 → 精度检查失败
修复后： max_abs_diff=0.000133514 → 精度检查通过（精度提升约 100 倍）

是否引起精度变化

是。XPU float32 矩阵乘法精度与 GPU 保持一致，max_abs_diff 从 ~0.014 降至 ~0.00013。

…atch GPU precision On XRE5 hardware, FCCalcType<float>() previously defaulted to FC_TF32, which uses 10-bit mantissa TensorFloat-32 accumulation. For large matmuls (e.g. [1,4096,4096] @ [4096,32000]), this causes max_abs_diff to exceed the 0.01 atol threshold vs GPU results. Change the default to FC_FLOAT (full float32 accumulation), matching the GPU policy where FLAGS_cublas_allow_tf32=false disables TF32 by default. Users who prefer TF32 performance can still set env var XPU_PADDLE_FC_TF32. Verified: max_abs_diff dropped from 0.0139818 to 0.000133514 for the paddle.matmul(Tensor([1,4096,4096],"float32"), Tensor([4096,32000],"float32")) configuration that was previously failing the accuracy check.

paddle-bot · 2026-04-10T03:02:09Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

YqGe585 · 2026-04-10T09:43:46Z

/re-run all-failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XPU] fix float32 matmul precision: use FC_FLOAT default instead of FC_TF32 on XRE5#78625

[XPU] fix float32 matmul precision: use FC_FLOAT default instead of FC_TF32 on XRE5#78625
YqGe585 wants to merge 1 commit intoPaddlePaddle:developfrom
YqGe585:xpu-worker1/GEY-25-xpu-precision

YqGe585 commented Apr 10, 2026

Uh oh!

paddle-bot bot commented Apr 10, 2026

Uh oh!

YqGe585 commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

YqGe585 commented Apr 10, 2026

PR Category

PR Types

Description

是否引起精度变化

Uh oh!

paddle-bot bot commented Apr 10, 2026

Uh oh!

YqGe585 commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant