FIX: TF32 warning (#43012) by shantanugupta2004 · Pull Request #43015 · huggingface/transformers

shantanugupta2004 · 2025-12-23T06:55:56Z

What does this PR do?

This PR replaces the matrix multiplication operator (@) with broadcasting element-wise multiplication (*) in the RotaryEmbedding implementation for several major models (Llama, Mistral, Mixtral, Qwen2, Gemma, Gemma2).
When compiling a model with torch.compile in bfloat16, the RoPE frequency calculation (which is intentionally kept in float32 for precision) triggers a UserWarning regarding TensorFloat32 (TF32) if it's not enabled.
Since the shapes involved in this specific operation [batch, dim/2, 1] and [batch, 1, seq_len] result in an outer product, using @ is mathematically equivalent to * with broadcasting. However, using * avoids the "matrix multiplication" code path in the compiler, effectively silencing the false-positive warning and potentially offering a minor performance optimization by avoiding a full GEMM call for a simple outer product.

Fixes #43012

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue link?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed.

github-actions · 2025-12-23T08:17:55Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: afmoe, apertus, arcee, aria, bamba, bitnet, chameleon, csm, cwm, dbrx, deepseek_v3, dia, diffllama, doge, dots1, emu3

ArthurZucker

This does not sound bad but at the same time this is very specific, we want users to know that full precision could me necessary

Rocketknight1 · 2026-01-12T14:41:41Z

I did some checking and the output of @ or * is completely identical in any precision, which makes me suspect the same kernel is being called in both cases. This means there should be no change in performance, but I think it's fine to use * to suppress the TF32 warning no? I think it will probably annoy/confuse users who are running the model in bfloat16 and don't understand why float32 matmuls are happening. cc @ArthurZucker

shantanugupta2004 added 3 commits December 23, 2025 12:18

fix: replace matmul with * to avoid tf32 warning

cb7c460

Fix copies for TF32 warning fix

3fa4e6e

modify remaining models to fix CI error

1d1fd74

shantanugupta2004 mentioned this pull request Dec 23, 2025

Compiling a bfloat16 model triggers float32 precision PyTorch warning #43012

Closed

4 tasks

ArthurZucker reviewed Jan 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: TF32 warning (#43012)#43015

FIX: TF32 warning (#43012)#43015
shantanugupta2004 wants to merge 3 commits intohuggingface:mainfrom
shantanugupta2004:floatwarn

shantanugupta2004 commented Dec 23, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

Rocketknight1 commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shantanugupta2004 commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shantanugupta2004 commented Dec 23, 2025 •

edited

Loading