Skip to content

add torch.compile to blockwise quantized kernel unit tests#4187

Open
iamzainhuda wants to merge 1 commit intomainfrom
torch-compile-kernel-tests
Open

add torch.compile to blockwise quantized kernel unit tests#4187
iamzainhuda wants to merge 1 commit intomainfrom
torch-compile-kernel-tests

Conversation

@iamzainhuda
Copy link
Copy Markdown
Contributor

@iamzainhuda iamzainhuda commented Mar 26, 2026

Summary

The quantization tests in test/prototype/blockwise_fp8_training/test_blockwise_kernels.py are now parameterized with use_compile, and a _maybe_compile(..., fullgraph=True) helper is used to compile the Triton quantization entry points when requested.

Testing

pytest test/prototype/blockwise_fp8_training/test_blockwise_kernels.py

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 26, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4187

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 25c1971 with merge base 96a9cdf (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 26, 2026
@iamzainhuda iamzainhuda added the module: training quantize_ api training flow label Mar 26, 2026
@danielvegamyhre
Copy link
Copy Markdown
Contributor

@iamzainhuda i think there was a miscommunication, we don't want to directly wrap an individual triton custom op in torch.compile and test - we want to compile a full blockwise linear layer (this test) and ensure there are no graph breaks (fullgraph=True) and that numerics of outputs/grads undergo the pass with the same threshold testing as the eager mode tests.

Copy link
Copy Markdown

@JiwaniZakir JiwaniZakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _maybe_compile helper in test_blockwise_kernels.py is clean and avoids repetition, but wrapping a freshly-created lambda each call means torch.compile receives a new function object on every test invocation with no opportunity for cache reuse across parametrized runs — this is fine for correctness but adds compile overhead. More importantly, there are no torch._dynamo.reset() calls between tests: compiled artifacts from one parametrized combination (e.g., block_size=128) can leak into the next (block_size=256), which can mask failures or produce misleading error messages in CI. It would be worth adding a torch._dynamo.reset() at the start of each test body when use_compile=True, or using a pytest fixture/autouse teardown for that. Also, the use_compile parametrize decorator is placed closer to the function than block_size in all cases, which means use_compile varies as the inner loop in the test matrix — this is a minor point but being consistent across all five tests about decorator ordering is good hygiene and it is consistent here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: training quantize_ api training flow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants