Add AFMOE mappings for awq and smoothquant by bartowski1182 · Pull Request #2316 · vllm-project/llm-compressor

bartowski1182 · 2026-01-30T17:23:20Z

SUMMARY:
These mappings are needed to properly apply AWQ and smoothquant to the Trinity series of models, AfmoeForCausalLM

TEST PLAN:
Quality was tested with benchmarks, without these changes the benchmark results were extremely low, with these changes it was close to margin of error compared to bf16/FP8 dynamic

Can test on Trinity-Large-Preview

https://huggingface.co/arcee-ai/Trinity-Large-Preview

github-actions · 2026-01-30T17:23:29Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

gemini-code-assist · 2026-01-30T17:23:38Z

Summary of Changes

Hello @bartowski1182, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the AfmoeForCausalLM model, a key component of the Trinity series, into the existing AWQ and SmoothQuant quantization frameworks. By defining precise layer mappings tailored to Afmoe's architectural specifics, these changes enable the effective application of quantization techniques, which is critical for improving the quality and performance of quantized models, as demonstrated by significant benchmark improvements.

Highlights

AWQ Mappings for AfmoeForCausalLM: Added specific AWQ quantization mappings for the AfmoeForCausalLM model, addressing its unique dual normalization (pre_mlp_layernorm) and attention gate projection (gate_proj) architecture. These mappings ensure proper quantization of layers like input_layernorm, v_proj, pre_mlp_layernorm, and up_proj.
SmoothQuant Mappings for AfmoeForCausalLM: Introduced dedicated SmoothQuant layer mappings for AfmoeForCausalLM. These mappings define how to balance and smooth layers, specifically targeting input_layernorm and pre_mlp_layernorm to optimize the quantization process for this model.
Enabling Trinity Series Quantization: These new mappings are crucial for correctly applying AWQ and SmoothQuant to the Trinity series of models, particularly AfmoeForCausalLM. Without these changes, benchmark results were extremely low, indicating that these mappings are essential for achieving acceptable quantized model quality.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for AFMoE models by adding the necessary mappings for AWQ and SmoothQuant. The changes are well-contained and follow existing patterns. I've identified a minor improvement opportunity in the regular expressions used for the AWQ mappings to ensure more precise matching. Overall, the implementation is solid.

src/llmcompressor/modifiers/awq/mappings.py

Signed-off-by: Colin Kealty <3266127+bartowski1182@users.noreply.github.com>

mergify · 2026-02-04T16:18:54Z

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/llm-compressor/blob/main/CONTRIBUTING.md

Signed-off-by: Colin Kealty <3266127+bartowski1182@users.noreply.github.com>

brian-dellabetta

Thanks for the contribution!

src/llmcompressor/modifiers/awq/mappings.py

HDCharles · 2026-02-13T16:15:24Z

would probably include a repro script for your test like: #2340

bartowski1182 · 2026-02-13T16:36:06Z

@HDCharles do you need me to do that or are you letting me know you'll be doing that?

HDCharles · 2026-02-17T16:09:58Z

yeah can you add a repro script to the PR description so its clear how you tested it?

HDCharles · 2026-02-17T16:10:11Z

also now there are conflicts

mergify · 2026-02-17T16:11:10Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bartowski1182.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

kylesayrs · 2026-02-19T06:09:44Z

@bartowski1182 Could you please rebase on main? I'll make sure this gets in

gemini-code-assist bot reviewed Jan 30, 2026

View reviewed changes

src/llmcompressor/modifiers/awq/mappings.py Show resolved Hide resolved

Add AFMOE mappings for awq and smoothquant

6e6497d

Signed-off-by: Colin Kealty <3266127+bartowski1182@users.noreply.github.com>

bartowski1182 force-pushed the main branch from 7008bd0 to 6e6497d Compare January 31, 2026 02:03

Merge branch 'main' into main

774c906

dsikka added the ready When a PR is ready for review label Feb 4, 2026

mergify bot added quality-failed and removed quality-failed labels Feb 4, 2026

Fix line length issue

96a2208

Signed-off-by: Colin Kealty <3266127+bartowski1182@users.noreply.github.com>

bartowski1182 force-pushed the main branch from f65bb8f to 96a2208 Compare February 4, 2026 23:07

bartowski1182 and others added 3 commits February 4, 2026 18:14

Merge branch 'main' into main

e87b19e

Merge branch 'main' into main

ee8ca27

Merge branch 'main' into main

8a156cc

brian-dellabetta approved these changes Feb 12, 2026

View reviewed changes

src/llmcompressor/modifiers/awq/mappings.py Show resolved Hide resolved

brian-dellabetta requested review from HDCharles and kylesayrs February 12, 2026 15:45

HDCharles approved these changes Feb 13, 2026

View reviewed changes

Merge branch 'main' into main

ac6ae2a

HDCharles self-requested a review February 17, 2026 16:10

mergify bot added the needs-rebase label Feb 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AFMOE mappings for awq and smoothquant#2316

Add AFMOE mappings for awq and smoothquant#2316
bartowski1182 wants to merge 7 commits intovllm-project:mainfrom
bartowski1182:main

bartowski1182 commented Jan 30, 2026

Uh oh!

github-actions bot commented Jan 30, 2026

Uh oh!

gemini-code-assist bot commented Jan 30, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

mergify bot commented Feb 4, 2026

Uh oh!

brian-dellabetta left a comment

Uh oh!

Uh oh!

HDCharles commented Feb 13, 2026

Uh oh!

bartowski1182 commented Feb 13, 2026

Uh oh!

HDCharles commented Feb 17, 2026

Uh oh!

HDCharles commented Feb 17, 2026

Uh oh!

mergify bot commented Feb 17, 2026

Uh oh!

kylesayrs commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments

Conversation

bartowski1182 commented Jan 30, 2026

Uh oh!

github-actions bot commented Jan 30, 2026

Uh oh!

gemini-code-assist bot commented Jan 30, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mergify bot commented Feb 4, 2026

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HDCharles commented Feb 13, 2026

Uh oh!

bartowski1182 commented Feb 13, 2026

Uh oh!

HDCharles commented Feb 17, 2026

Uh oh!

HDCharles commented Feb 17, 2026

Uh oh!

mergify bot commented Feb 17, 2026

Uh oh!

kylesayrs commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments