Skip to content

Conversation

@xin3he
Copy link
Contributor

@xin3he xin3he commented Feb 4, 2026

Description

Please briefly describe your main changes, the motivation.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

Related Issues

Fixes or relates to #

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

Copilot AI review requested due to automatic review settings February 4, 2026 10:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for directly loading GPT-OSS models quantized with MXFP4 format by automatically detecting MXFP4 quantization and applying dequantization during model loading.

Changes:

  • Updated model references in test files from local/unsloth paths to official OpenAI model identifiers
  • Added MXFP4 quantization detection and automatic dequantization support in model loading utilities

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
test/test_cuda/models/test_moe_model.py Updated GPT-OSS model reference from local path to OpenAI identifier
test/test_cpu/models/test_moe_model.py Updated GPT-OSS model reference from unsloth path to OpenAI identifier
auto_round/utils/model.py Added MXFP4 detection function and integrated dequantization config into model loading

Signed-off-by: He, Xin3 <[email protected]>
@xin3he xin3he requested review from n1ck-guo and yiliu30 February 9, 2026 06:31
)
model_type = getattr(config, "model_type", "")
return quant_method == "mxfp4" and model_type in supported_model_types
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use try catch here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, should change with an efficient way

# Check if model is MXFP4 quantized and needs dequantization
# Only set quantization_config when explicitly needed, to avoid overriding model's built-in config
if _is_mxfp4_model(pretrained_model_name_or_path):
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, I prefer to check version instead of try catch. Using too much try-catch blocks might prevent some bugs from being exposed.

def setup_gpt_oss():
"""Fixture to set up the GPT-OSS model and tokenizer."""
model_name = "/models/gpt-oss-20b-BF16"
model_name = "openai/gpt-oss-20b"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path is currently used to load the BF16 gpt-oss model, so please keep it as is.
You can add a new path specifically for the MXFP4 model.

trust_remote_code=trust_remote_code,
device_map="auto" if use_auto_mapping else None,
)
model = model_cls.from_pretrained(pretrained_model_name_or_path, **load_kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently don’t have enough test coverage for HPU, so please make any changes carefully. If possible, adding more UTs would be really helpful!


# Check if model is MXFP4 quantized and needs dequantization
# Only set quantization_config when explicitly needed, to avoid overriding model's built-in config
if _is_mxfp4_model(pretrained_model_name_or_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a small concern that this check might slow down the Auto‑round initialization.
Could you please double‑check it? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants