Skip to content

Add MXFP4 Per Token Group Quant kernel and tests#106

Open
sspintel wants to merge 3 commits intosgl-project:mainfrom
sspintel:dev/sp/per_token_group_quant_mxfp4
Open

Add MXFP4 Per Token Group Quant kernel and tests#106
sspintel wants to merge 3 commits intosgl-project:mainfrom
sspintel:dev/sp/per_token_group_quant_mxfp4

Conversation

@sspintel
Copy link
Contributor

@sspintel sspintel commented Feb 3, 2026

No description provided.

@sspintel sspintel marked this pull request as ready for review February 12, 2026 06:43
@sspintel sspintel force-pushed the dev/sp/per_token_group_quant_mxfp4 branch from edab1a0 to e245569 Compare February 17, 2026 05:57

batch_size_range = [1, 2, 4, 8, 16, 32, 64] if not IS_CI else [1, 4, 16]
seq_len_range = [64, 128, 256, 512, 1024, 2048] if not IS_CI else [64, 256]
group_size_range = [32, 64, 128]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we support >32 grp size for fp4?
Since we have this check isn't it?

assert group_size == 32, f"group_size must be 32 for MXFP4, got {group_size}"

@@ -0,0 +1,356 @@
# SPDX-License-Identifier: Apache-2.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this bench mark test to ci flow in here: .github/workflows/pr-test-xpu.yml

print("\n" + "=" * 100)
print("Summary Statistics by Provider")
print("=" * 100)
summary = df.groupby("provider").agg(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have performance numbers? How much memory bandwidth we can achieve with this kernel?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments