[XPU] fix float32 matmul precision: use FC_FLOAT default instead of FC_TF32 on XRE5#78625
Open
YqGe585 wants to merge 1 commit intoPaddlePaddle:developfrom
Open
[XPU] fix float32 matmul precision: use FC_FLOAT default instead of FC_TF32 on XRE5#78625YqGe585 wants to merge 1 commit intoPaddlePaddle:developfrom
YqGe585 wants to merge 1 commit intoPaddlePaddle:developfrom
Conversation
…atch GPU precision On XRE5 hardware, FCCalcType<float>() previously defaulted to FC_TF32, which uses 10-bit mantissa TensorFloat-32 accumulation. For large matmuls (e.g. [1,4096,4096] @ [4096,32000]), this causes max_abs_diff to exceed the 0.01 atol threshold vs GPU results. Change the default to FC_FLOAT (full float32 accumulation), matching the GPU policy where FLAGS_cublas_allow_tf32=false disables TF32 by default. Users who prefer TF32 performance can still set env var XPU_PADDLE_FC_TF32. Verified: max_abs_diff dropped from 0.0139818 to 0.000133514 for the paddle.matmul(Tensor([1,4096,4096],"float32"), Tensor([4096,32000],"float32")) configuration that was previously failing the accuracy check.
|
你的PR提交成功,感谢你对开源项目的贡献! |
Member
Author
|
/re-run all-failed |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Category
Custom Device
PR Types
Bug fixes
Description
在 XPU XRE5 硬件上,
FCCalcType<float>()原来默认使用FC_TF32(TensorFloat-32,仅 10 位尾数精度)。在矩阵 K 维度较大时(如 K=4096),TF32 的截断误差会累积,导致paddle.matmul的 XPU 结果与 GPU 结果之间的最大绝对误差超过阈值(实测max_abs_diff=0.0139818 > atol=0.01)。GPU 侧默认关闭 TF32(
FLAGS_cublas_allow_tf32=false),使用完整 float32 精度。本 PR 将 XRE5 上的默认计算类型从FC_TF32改为FC_FLOAT,使 XPU 与 GPU 精度行为保持一致。需要 TF32 性能的用户仍可通过设置环境变量
XPU_PADDLE_FC_TF32来启用。修复前:
max_abs_diff=0.0139818→ 精度检查失败修复后:
max_abs_diff=0.000133514→ 精度检查通过(精度提升约 100 倍)是否引起精度变化
是。XPU float32 矩阵乘法精度与 GPU 保持一致,max_abs_diff 从 ~0.014 降至 ~0.00013。