Refactor Matching Logic to Use compressed-tensors Utilities by Etelis · Pull Request #2284 · vllm-project/llm-compressor

Etelis · 2026-01-24T08:00:28Z

Consolidates 17 redundant matching functions into standardized compressed-tensors APIs.

Deleted 15 functions from module.py: get_layers, get_params, get_prunable_layers, get_quantizable_layers, match_targets, etc.
Added 2 helpers: expand_special_targets() (backward compatibility) and build_parameterized_layers()
Updated modifiers: SparseGPT, magnitude pruning, constant pruning to use new APIs
Bug fix: Added missing self.targets parameter in magnitude pruning

github-actions · 2026-01-24T08:00:36Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

gemini-code-assist · 2026-01-24T08:00:53Z

Summary of Changes

Hello @Etelis, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request undertakes a significant refactoring effort to standardize the module and parameter matching logic within the llmcompressor library. By leveraging utilities from the compressed-tensors library, it eliminates a substantial amount of redundant code and ensures a consistent approach to identifying and processing layers and parameters across various pruning modifiers. This change enhances the maintainability and extensibility of the codebase, making it more robust for future development.

Highlights

Code Consolidation: Consolidated 17 redundant matching functions into standardized APIs provided by the compressed-tensors library, significantly reducing code duplication.
New Helper Functions: Introduced two new helper functions: expand_special_targets() for backward compatibility with deprecated target constants and build_parameterized_layers() for constructing ModelParameterizedLayer objects using the new compressed-tensors utilities.
Modifier Updates: Updated existing pruning modifiers, including Constant Pruning, Magnitude Pruning, and SparseGPT, to utilize the newly integrated compressed-tensors matching logic.
Bug Fix: Addressed a bug in magnitude pruning by correctly passing the self.targets parameter to the build_parameterized_layers function.
Deprecated Function Removal: Removed 15 deprecated matching-related functions from llmcompressor.utils.pytorch.module, streamlining the module's public interface.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request is a significant refactoring that replaces custom module and parameter matching logic with standardized utilities from the compressed-tensors library. This consolidation simplifies the codebase by removing around 15 redundant functions and updating several modifiers (SparseGPT, magnitude pruning, constant pruning) to use the new, more explicit APIs. The changes are well-executed and improve code clarity and maintainability. Additionally, a bug in the magnitude pruning modifier, where the targets parameter was missing, has been fixed. The new logic is supported by new unit tests.

My feedback includes a couple of suggestions to further improve maintainability by reducing code duplication and adhering to Python's import best practices.

src/llmcompressor/modifiers/pruning/sparsegpt/sgpt_base.py

src/llmcompressor/utils/pytorch/module.py

brian-dellabetta

Thanks for looking into this. leaving a few comments for now

src/llmcompressor/modifiers/pruning/sparsegpt/sgpt_base.py

src/llmcompressor/utils/pytorch/module.py

tests/llmcompressor/utils/pytorch/test_module.py

dsikka

The commit history looks strange. Can you rebase?

Consolidates 17 redundant matching functions into standardized compressed-tensors APIs to eliminate code duplication and improve maintainability. Changes: - Add expand_special_targets() helper for backward-compatible constant expansion - Add build_parameterized_layers() to replace get_layers_params() - Remove 15 old matching functions from module.py (~260 lines deleted) - Update SparseGPT, magnitude, and constant pruning modifiers - Replace get_linear_layers() with match_named_modules() - Update tests to use new APIs and PyTorch's native get_submodule() Special constants (__ALL_PRUNABLE__, __ALL_QUANTIZABLE__) still work with deprecation warnings for backward compatibility. Fixes vllm-project#1686 Signed-off-by: Itay Etelis <itay.etelis@ibm.com>

- Move PRUNABLE_LAYER_TYPES to module-level constant in sgpt_base.py - Move match_named_parameters import to top of module.py - Remove unused QAT imports from module.py - Remove test_get_submodule test (tests PyTorch built-in) Signed-off-by: Itay Etlis <itay.etelis@gmail.com> Signed-off-by: Itay Etelis <itay.etelis@ibm.com>

Etelis · 2026-01-26T17:38:56Z

Addressed all review comments:

PRUNABLE_LAYER_TYPES constant: Created module-level constant in sgpt_base.py and using it in both places
Import location: Moved match_named_parameters import to top of module.py
Unused QAT imports: Removed (they were only used in deleted functions)
test_get_submodule: Removed the test (tests PyTorch built-in)

Etelis · 2026-01-26T18:11:09Z

Regarding the QAT imports question:

The QAT layers are still supported because they have the same __name__ as regular layers:

QATLinear.__name__ = "Linear"
QATConv2d.__name__ = "Conv2d"
QATConv3d.__name__ = "Conv3d"

So match_named_modules(model, ["Linear"]) will match both nn.Linear and QATLinear because it checks by class name (module.__class__.__name__).

The explicit imports were only needed for the old isinstance() checks. The new class-name-based matching in compressed-tensors doesn't require them.

brian-dellabetta · 2026-01-26T18:12:43Z

Regarding the QAT imports question:

The QAT layers are still supported because they have the same __name__ as regular layers:
* `QATLinear.__name__` = "Linear"

* `QATConv2d.__name__` = "Conv2d"

* `QATConv3d.__name__` = "Conv3d"
So match_named_modules(model, ["Linear"]) will match both nn.Linear and QATLinear because it checks by class name (module.__class__.__name__).

The explicit imports were only needed for the old isinstance() checks. The new class-name-based matching in compressed-tensors doesn't require them.

Ah ok great, good catch. i missed that the names lined up

brian-dellabetta

approving pending @kylesayrs review

mergify · 2026-01-26T18:13:40Z

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/llm-compressor/blob/main/CONTRIBUTING.md

Signed-off-by: Itay Etlis <itay.etelis@gmail.com> Signed-off-by: Itay Etelis <itay.etelis@ibm.com>

src/llmcompressor/pipelines/sequential/helpers.py

HDCharles

see comment and test failures

- Fix match_named_parameters not matching class names (e.g., "Linear") by switching to match_named_modules which supports class name matching - Inline match_modules function as it was only used in one place - Remove match_modules function from helpers.py Signed-off-by: Itay Etelis <itay.etelis@ibm.com>

HDCharles

loooking good

mergify · 2026-01-30T03:12:31Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Etelis.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Resolve merge conflicts: - magnitude/base.py: Keep build_parameterized_layers usage - helpers.py: Merge imports (keep match_named_modules, add new offload API) Signed-off-by: Itay Etelis <itayetelis@gmail.com>

Etelis · 2026-01-30T10:01:59Z

Resolved merge conflicts araised.

Can you guys re approve and get it merged?

@brian-dellabetta , @HDCharles

- Fix build_parameterized_layers to set param_name as full path (e.g., "0.weight" instead of just "0") - Remove extra brackets around sequential_targets in sgpt_base.py that caused TypeError: unhashable type 'list' Signed-off-by: Itay Etelis <itay.etelis@ibm.com>

Etelis · 2026-01-30T16:31:48Z

Tested PR #2284 on a remote machine:

python3 -m pytest tests/llmcompressor/ --ignore=tests/llmcompressor/transformers --ignore=tests/llmcompressor/modeling

=========== 444 passed, 6 skipped, 7472 warnings in 91.67s (0:01:31) ===========

All base tests pass.

Etelis · 2026-01-30T17:30:51Z

Hey @brian-team,

I ran the full transformers compression test suite locally on a GPU machine (RTX 3090) and all tests pass (62 passed, 5 skipped).

The CI failure was a transient 504 Gateway Timeout from HuggingFace Hub:

requests.exceptions.HTTPError: 504 Server Error: Gateway Time-out for url: https://huggingface.co/api/models/nm-testing/tinysmokellama-3.2/xet-read-token/...

This is an infrastructure issue on HF's side, not a code problem. Could we re-run the failed CI job?

mergify · 2026-02-02T15:44:53Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Etelis.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

brian-dellabetta

Looks like some tests have not been updated with the removed helper methods

tests/llmcompressor/transformers/sparsegpt/test_sparsegpt_owl.py:10: in <module>
    from llmcompressor.utils.pytorch.module import get_layers
E   ImportError: cannot import name 'get_layers' from 'llmcompressor.utils.pytorch.module' (/home/runner/_work/_tool/Python/3.12.12/x64/lib/python3.12/site-packages/llmcompressor/utils/pytorch/module.py)

Etelis requested review from HDCharles, brian-dellabetta, dsikka and kylesayrs as code owners January 24, 2026 08:00

gemini-code-assist bot reviewed Jan 24, 2026

View reviewed changes

src/llmcompressor/modifiers/pruning/sparsegpt/sgpt_base.py Outdated Show resolved Hide resolved

src/llmcompressor/utils/pytorch/module.py Outdated Show resolved Hide resolved

brian-dellabetta reviewed Jan 26, 2026

View reviewed changes

dsikka requested changes Jan 26, 2026

View reviewed changes

Etelis force-pushed the refactor/consolidate-matching-logic branch from 9e64749 to 59a7c61 Compare January 26, 2026 17:27

Etelis requested review from brian-dellabetta and dsikka January 26, 2026 17:40

brian-dellabetta previously approved these changes Jan 26, 2026

View reviewed changes

mergify bot added the quality-failed label Jan 26, 2026

Fix line too long in docstring

532225b

Signed-off-by: Itay Etlis <itay.etelis@gmail.com> Signed-off-by: Itay Etelis <itay.etelis@ibm.com>

Etelis dismissed brian-dellabetta’s stale review via 532225b January 26, 2026 18:20

mergify bot removed the quality-failed label Jan 26, 2026

HDCharles reviewed Jan 27, 2026

View reviewed changes

src/llmcompressor/pipelines/sequential/helpers.py Outdated Show resolved Hide resolved

HDCharles added the ready When a PR is ready for review label Jan 27, 2026

HDCharles requested changes Jan 27, 2026

View reviewed changes

Etelis and others added 2 commits January 27, 2026 20:50

Merge branch 'main' into refactor/consolidate-matching-logic

1340354

Etelis force-pushed the refactor/consolidate-matching-logic branch from ab6373f to 47391cf Compare January 27, 2026 18:57

Etelis requested a review from brian-dellabetta January 27, 2026 18:57

Etelis requested a review from HDCharles January 27, 2026 18:57

Merge branch 'main' into refactor/consolidate-matching-logic

0929a96

HDCharles previously approved these changes Jan 27, 2026

View reviewed changes

mergify bot added the needs-rebase label Jan 30, 2026

Merge upstream/main into refactor/consolidate-matching-logic

fc4b9f8

Resolve merge conflicts: - magnitude/base.py: Keep build_parameterized_layers usage - helpers.py: Merge imports (keep match_named_modules, add new offload API) Signed-off-by: Itay Etelis <itayetelis@gmail.com>

Etelis dismissed HDCharles’s stale review via fc4b9f8 January 30, 2026 09:59

mergify bot removed the needs-rebase label Jan 30, 2026

brian-dellabetta previously approved these changes Jan 30, 2026

View reviewed changes

Etelis dismissed brian-dellabetta’s stale review via 05eb539 January 30, 2026 16:06

Etelis force-pushed the refactor/consolidate-matching-logic branch from 05eb539 to c48f32a Compare January 30, 2026 16:07

Merge branch 'main' into refactor/consolidate-matching-logic

bef348b

HDCharles approved these changes Feb 2, 2026

View reviewed changes

mergify bot added the needs-rebase label Feb 2, 2026

Etelis requested a review from brian-dellabetta February 9, 2026 06:48

brian-dellabetta reviewed Feb 9, 2026

View reviewed changes

Conversation

Etelis commented Jan 24, 2026

Uh oh!

github-actions bot commented Jan 24, 2026

Uh oh!

gemini-code-assist bot commented Jan 24, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

Etelis commented Jan 26, 2026

Uh oh!

Etelis commented Jan 26, 2026

Uh oh!

brian-dellabetta commented Jan 26, 2026

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Jan 26, 2026

Uh oh!

Uh oh!

HDCharles left a comment

Choose a reason for hiding this comment

Uh oh!

HDCharles left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Jan 30, 2026

Uh oh!

Etelis commented Jan 30, 2026

Uh oh!

Etelis commented Jan 30, 2026

Uh oh!

Etelis commented Jan 30, 2026

Uh oh!

mergify bot commented Feb 2, 2026

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments