Skip to content

Add AFMOE mappings for awq and smoothquant#2316

Open
bartowski1182 wants to merge 7 commits intovllm-project:mainfrom
bartowski1182:main
Open

Add AFMOE mappings for awq and smoothquant#2316
bartowski1182 wants to merge 7 commits intovllm-project:mainfrom
bartowski1182:main

Conversation

@bartowski1182
Copy link

SUMMARY:
These mappings are needed to properly apply AWQ and smoothquant to the Trinity series of models, AfmoeForCausalLM

TEST PLAN:
Quality was tested with benchmarks, without these changes the benchmark results were extremely low, with these changes it was close to margin of error compared to bf16/FP8 dynamic

Can test on Trinity-Large-Preview

https://huggingface.co/arcee-ai/Trinity-Large-Preview

@github-actions
Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @bartowski1182, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the AfmoeForCausalLM model, a key component of the Trinity series, into the existing AWQ and SmoothQuant quantization frameworks. By defining precise layer mappings tailored to Afmoe's architectural specifics, these changes enable the effective application of quantization techniques, which is critical for improving the quality and performance of quantized models, as demonstrated by significant benchmark improvements.

Highlights

  • AWQ Mappings for AfmoeForCausalLM: Added specific AWQ quantization mappings for the AfmoeForCausalLM model, addressing its unique dual normalization (pre_mlp_layernorm) and attention gate projection (gate_proj) architecture. These mappings ensure proper quantization of layers like input_layernorm, v_proj, pre_mlp_layernorm, and up_proj.
  • SmoothQuant Mappings for AfmoeForCausalLM: Introduced dedicated SmoothQuant layer mappings for AfmoeForCausalLM. These mappings define how to balance and smooth layers, specifically targeting input_layernorm and pre_mlp_layernorm to optimize the quantization process for this model.
  • Enabling Trinity Series Quantization: These new mappings are crucial for correctly applying AWQ and SmoothQuant to the Trinity series of models, particularly AfmoeForCausalLM. Without these changes, benchmark results were extremely low, indicating that these mappings are essential for achieving acceptable quantized model quality.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for AFMoE models by adding the necessary mappings for AWQ and SmoothQuant. The changes are well-contained and follow existing patterns. I've identified a minor improvement opportunity in the regular expressions used for the AWQ mappings to ensure more precise matching. Overall, the implementation is solid.

Signed-off-by: Colin Kealty <3266127+bartowski1182@users.noreply.github.com>
@dsikka dsikka added the ready When a PR is ready for review label Feb 4, 2026
@mergify
Copy link
Contributor

mergify bot commented Feb 4, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/llm-compressor/blob/main/CONTRIBUTING.md

Signed-off-by: Colin Kealty <3266127+bartowski1182@users.noreply.github.com>
Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

@HDCharles
Copy link
Collaborator

would probably include a repro script for your test like: #2340

@bartowski1182
Copy link
Author

@HDCharles do you need me to do that or are you letting me know you'll be doing that?

@HDCharles
Copy link
Collaborator

yeah can you add a repro script to the PR description so its clear how you tested it?

@HDCharles
Copy link
Collaborator

also now there are conflicts

@HDCharles HDCharles self-requested a review February 17, 2026 16:10
@mergify
Copy link
Contributor

mergify bot commented Feb 17, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bartowski1182.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 17, 2026
@kylesayrs
Copy link
Collaborator

@bartowski1182 Could you please rebase on main? I'll make sure this gets in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-rebase ready When a PR is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments