-
Notifications
You must be signed in to change notification settings - Fork 77
support gpt-oss mxfp4 directly loading #1401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for directly loading GPT-OSS models quantized with MXFP4 format by automatically detecting MXFP4 quantization and applying dequantization during model loading.
Changes:
- Updated model references in test files from local/unsloth paths to official OpenAI model identifiers
- Added MXFP4 quantization detection and automatic dequantization support in model loading utilities
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| test/test_cuda/models/test_moe_model.py | Updated GPT-OSS model reference from local path to OpenAI identifier |
| test/test_cpu/models/test_moe_model.py | Updated GPT-OSS model reference from unsloth path to OpenAI identifier |
| auto_round/utils/model.py | Added MXFP4 detection function and integrated dequantization config into model loading |
Signed-off-by: He, Xin3 <[email protected]>
| ) | ||
| model_type = getattr(config, "model_type", "") | ||
| return quant_method == "mxfp4" and model_type in supported_model_types | ||
| except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use try catch here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, should change with an efficient way
| # Check if model is MXFP4 quantized and needs dequantization | ||
| # Only set quantization_config when explicitly needed, to avoid overriding model's built-in config | ||
| if _is_mxfp4_model(pretrained_model_name_or_path): | ||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, I prefer to check version instead of try catch. Using too much try-catch blocks might prevent some bugs from being exposed.
| def setup_gpt_oss(): | ||
| """Fixture to set up the GPT-OSS model and tokenizer.""" | ||
| model_name = "/models/gpt-oss-20b-BF16" | ||
| model_name = "openai/gpt-oss-20b" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This path is currently used to load the BF16 gpt-oss model, so please keep it as is.
You can add a new path specifically for the MXFP4 model.
| trust_remote_code=trust_remote_code, | ||
| device_map="auto" if use_auto_mapping else None, | ||
| ) | ||
| model = model_cls.from_pretrained(pretrained_model_name_or_path, **load_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently don’t have enough test coverage for HPU, so please make any changes carefully. If possible, adding more UTs would be really helpful!
|
|
||
| # Check if model is MXFP4 quantized and needs dequantization | ||
| # Only set quantization_config when explicitly needed, to avoid overriding model's built-in config | ||
| if _is_mxfp4_model(pretrained_model_name_or_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a small concern that this check might slow down the Auto‑round initialization.
Could you please double‑check it? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, thanks!
Description
Please briefly describe your main changes, the motivation.
Type of Change
Related Issues
Fixes or relates to #
Checklist Before Submitting