Skip to content

Refactor Matching Logic to Use compressed-tensors Utilities#2284

Open
Etelis wants to merge 9 commits intovllm-project:mainfrom
Etelis:refactor/consolidate-matching-logic
Open

Refactor Matching Logic to Use compressed-tensors Utilities#2284
Etelis wants to merge 9 commits intovllm-project:mainfrom
Etelis:refactor/consolidate-matching-logic

Conversation

@Etelis
Copy link
Contributor

@Etelis Etelis commented Jan 24, 2026

Consolidates 17 redundant matching functions into standardized compressed-tensors APIs.

Fixes #1686

  • Deleted 15 functions from module.py: get_layers, get_params, get_prunable_layers, get_quantizable_layers, match_targets, etc.
  • Added 2 helpers: expand_special_targets() (backward compatibility) and build_parameterized_layers()
  • Updated modifiers: SparseGPT, magnitude pruning, constant pruning to use new APIs
  • Bug fix: Added missing self.targets parameter in magnitude pruning

@github-actions
Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Etelis, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request undertakes a significant refactoring effort to standardize the module and parameter matching logic within the llmcompressor library. By leveraging utilities from the compressed-tensors library, it eliminates a substantial amount of redundant code and ensures a consistent approach to identifying and processing layers and parameters across various pruning modifiers. This change enhances the maintainability and extensibility of the codebase, making it more robust for future development.

Highlights

  • Code Consolidation: Consolidated 17 redundant matching functions into standardized APIs provided by the compressed-tensors library, significantly reducing code duplication.
  • New Helper Functions: Introduced two new helper functions: expand_special_targets() for backward compatibility with deprecated target constants and build_parameterized_layers() for constructing ModelParameterizedLayer objects using the new compressed-tensors utilities.
  • Modifier Updates: Updated existing pruning modifiers, including Constant Pruning, Magnitude Pruning, and SparseGPT, to utilize the newly integrated compressed-tensors matching logic.
  • Bug Fix: Addressed a bug in magnitude pruning by correctly passing the self.targets parameter to the build_parameterized_layers function.
  • Deprecated Function Removal: Removed 15 deprecated matching-related functions from llmcompressor.utils.pytorch.module, streamlining the module's public interface.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant refactoring that replaces custom module and parameter matching logic with standardized utilities from the compressed-tensors library. This consolidation simplifies the codebase by removing around 15 redundant functions and updating several modifiers (SparseGPT, magnitude pruning, constant pruning) to use the new, more explicit APIs. The changes are well-executed and improve code clarity and maintainability. Additionally, a bug in the magnitude pruning modifier, where the targets parameter was missing, has been fixed. The new logic is supported by new unit tests.

My feedback includes a couple of suggestions to further improve maintainability by reducing code duplication and adhering to Python's import best practices.

Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this. leaving a few comments for now

Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit history looks strange. Can you rebase?

Consolidates 17 redundant matching functions into standardized
compressed-tensors APIs to eliminate code duplication and improve
maintainability.

Changes:
- Add expand_special_targets() helper for backward-compatible constant expansion
- Add build_parameterized_layers() to replace get_layers_params()
- Remove 15 old matching functions from module.py (~260 lines deleted)
- Update SparseGPT, magnitude, and constant pruning modifiers
- Replace get_linear_layers() with match_named_modules()
- Update tests to use new APIs and PyTorch's native get_submodule()

Special constants (__ALL_PRUNABLE__, __ALL_QUANTIZABLE__) still work
with deprecation warnings for backward compatibility.

Fixes vllm-project#1686

Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
@Etelis Etelis force-pushed the refactor/consolidate-matching-logic branch from 9e64749 to 59a7c61 Compare January 26, 2026 17:27
- Move PRUNABLE_LAYER_TYPES to module-level constant in sgpt_base.py
- Move match_named_parameters import to top of module.py
- Remove unused QAT imports from module.py
- Remove test_get_submodule test (tests PyTorch built-in)

Signed-off-by: Itay Etlis <itay.etelis@gmail.com>
Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
@Etelis
Copy link
Contributor Author

Etelis commented Jan 26, 2026

Addressed all review comments:

  • PRUNABLE_LAYER_TYPES constant: Created module-level constant in sgpt_base.py and using it in both places
  • Import location: Moved match_named_parameters import to top of module.py
  • Unused QAT imports: Removed (they were only used in deleted functions)
  • test_get_submodule: Removed the test (tests PyTorch built-in)

@Etelis
Copy link
Contributor Author

Etelis commented Jan 26, 2026

Regarding the QAT imports question:

The QAT layers are still supported because they have the same __name__ as regular layers:

  • QATLinear.__name__ = "Linear"
  • QATConv2d.__name__ = "Conv2d"
  • QATConv3d.__name__ = "Conv3d"

So match_named_modules(model, ["Linear"]) will match both nn.Linear and QATLinear because it checks by class name (module.__class__.__name__).

The explicit imports were only needed for the old isinstance() checks. The new class-name-based matching in compressed-tensors doesn't require them.

@brian-dellabetta
Copy link
Collaborator

Regarding the QAT imports question:

The QAT layers are still supported because they have the same __name__ as regular layers:

* `QATLinear.__name__` = "Linear"

* `QATConv2d.__name__` = "Conv2d"

* `QATConv3d.__name__` = "Conv3d"

So match_named_modules(model, ["Linear"]) will match both nn.Linear and QATLinear because it checks by class name (module.__class__.__name__).

The explicit imports were only needed for the old isinstance() checks. The new class-name-based matching in compressed-tensors doesn't require them.

Ah ok great, good catch. i missed that the names lined up

Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving pending @kylesayrs review

@mergify
Copy link
Contributor

mergify bot commented Jan 26, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/llm-compressor/blob/main/CONTRIBUTING.md

Signed-off-by: Itay Etlis <itay.etelis@gmail.com>
Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
@HDCharles HDCharles added the ready When a PR is ready for review label Jan 27, 2026
Copy link
Collaborator

@HDCharles HDCharles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment and test failures

Etelis and others added 2 commits January 27, 2026 20:50
- Fix match_named_parameters not matching class names (e.g., "Linear")
  by switching to match_named_modules which supports class name matching
- Inline match_modules function as it was only used in one place
- Remove match_modules function from helpers.py

Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
@Etelis Etelis force-pushed the refactor/consolidate-matching-logic branch from ab6373f to 47391cf Compare January 27, 2026 18:57
@Etelis Etelis requested a review from HDCharles January 27, 2026 18:57
HDCharles
HDCharles previously approved these changes Jan 27, 2026
Copy link
Collaborator

@HDCharles HDCharles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loooking good

@mergify
Copy link
Contributor

mergify bot commented Jan 30, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Etelis.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 30, 2026
Resolve merge conflicts:
- magnitude/base.py: Keep build_parameterized_layers usage
- helpers.py: Merge imports (keep match_named_modules, add new offload API)

Signed-off-by: Itay Etelis <itayetelis@gmail.com>
@Etelis
Copy link
Contributor Author

Etelis commented Jan 30, 2026

Resolved merge conflicts araised.

Can you guys re approve and get it merged?

@brian-dellabetta , @HDCharles

@mergify mergify bot removed the needs-rebase label Jan 30, 2026
- Fix build_parameterized_layers to set param_name as full path
  (e.g., "0.weight" instead of just "0")
- Remove extra brackets around sequential_targets in sgpt_base.py
  that caused TypeError: unhashable type 'list'

Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
@Etelis Etelis force-pushed the refactor/consolidate-matching-logic branch from 05eb539 to c48f32a Compare January 30, 2026 16:07
@Etelis
Copy link
Contributor Author

Etelis commented Jan 30, 2026

Tested PR #2284 on a remote machine:

python3 -m pytest tests/llmcompressor/ --ignore=tests/llmcompressor/transformers --ignore=tests/llmcompressor/modeling

=========== 444 passed, 6 skipped, 7472 warnings in 91.67s (0:01:31) ===========

All base tests pass.

@Etelis
Copy link
Contributor Author

Etelis commented Jan 30, 2026

Hey @brian-team,

I ran the full transformers compression test suite locally on a GPU machine (RTX 3090) and all tests pass (62 passed, 5 skipped).

The CI failure was a transient 504 Gateway Timeout from HuggingFace Hub:

requests.exceptions.HTTPError: 504 Server Error: Gateway Time-out for url: https://huggingface.co/api/models/nm-testing/tinysmokellama-3.2/xet-read-token/...

This is an infrastructure issue on HF's side, not a code problem. Could we re-run the failed CI job?

@mergify
Copy link
Contributor

mergify bot commented Feb 2, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Etelis.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some tests have not been updated with the removed helper methods

tests/llmcompressor/transformers/sparsegpt/test_sparsegpt_owl.py:10: in <module>
    from llmcompressor.utils.pytorch.module import get_layers
E   ImportError: cannot import name 'get_layers' from 'llmcompressor.utils.pytorch.module' (/home/runner/_work/_tool/Python/3.12.12/x64/lib/python3.12/site-packages/llmcompressor/utils/pytorch/module.py)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-rebase ready When a PR is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Help Wanted] Refactor matching logic

5 participants

Comments