-
Notifications
You must be signed in to change notification settings - Fork 436
Labels
bugSomething isn't workingSomething isn't working
Description
βοΈ Your current environment
The output of python collect_env.py
### Environment Information ###
Operating System: `Linux-6.8.0-94-generic-x86_64-with-glibc2.39`
Python Version: `3.12.3 (main, Nov 6 2025, 13:44:16) [GCC 13.3.0]`
llm-compressor Version: `0.9.0.1`
compressed-tensors Version: `0.13.0`
transformers Version: `4.57.3`
torch Version: `2.9.1`
CUDA Devices: `['NVIDIA RTX PRO 6000 Blackwell Workstation Edition']`
AMD Devices: `None`
NPU Devices: `None`
π Describe the bug
Attempts to quantize Granite 4.0-h-small to 4bits fail in a variety of ways:
- If I use simple w4a16, implemented following the FP8 example (source attached), I get:
β Quantization complete
Attempting 3D conversion...
Traceback (most recent call last):
File "/root/test_w4a16_no_exclusion.py", line 95, in <module>
main()
File "/root/test_w4a16_no_exclusion.py", line 78, in main
m.to_3d_expert()
File "/opt/venv/datasci/lib/python3.12/site-packages/llmcompressor/modeling/granite4.py", line 40, in to_3d_expert
self.weight.shape == torch.Size((dim0_mul, self.input_size))
AssertionError: Shape mismatch, please check.
If I try GPTQ (source also attached), I get:
Traceback (most recent call last):
File "/root/test_gptq_no_exclusion.py", line 156, in <module>
main()
File "/root/test_gptq_no_exclusion.py", line 142, in main
oneshot(
File "/opt/venv/datasci/lib/python3.12/site-packages/llmcompressor/entrypoints/oneshot.py", line 357, in oneshot
one_shot()
File "/opt/venv/datasci/lib/python3.12/site-packages/llmcompressor/entrypoints/oneshot.py", line 172, in __call__
self.apply_recipe_modifiers(
File "/opt/venv/datasci/lib/python3.12/site-packages/llmcompressor/entrypoints/oneshot.py", line 222, in apply_recipe_modifiers
pipeline(
File "/opt/venv/datasci/lib/python3.12/site-packages/llmcompressor/pipelines/independent/pipeline.py", line 45, in __call__
pipeline(model, dataloader, dataset_args)
File "/opt/venv/datasci/lib/python3.12/site-packages/llmcompressor/pipelines/sequential/pipeline.py", line 73, in __call__
subgraphs = trace_subgraphs(model, sample_input, sequential_targets, ignore)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/datasci/lib/python3.12/site-packages/llmcompressor/pipelines/sequential/helpers.py", line 135, in trace_subgraphs
tracer.trace(
File "/opt/venv/datasci/lib/python3.12/site-packages/llmcompressor/pipelines/sequential/transformers_helpers.py", line 1485, in trace
self.graph.erase_node(user)
File "/opt/venv/datasci/lib/python3.12/site-packages/torch/fx/graph.py", line 1257, in erase_node
raise RuntimeError(
RuntimeError: Tried to erase Node getitem_169 but it still had 1 users in the graph: {output: None}!
I understand supporting 4bit for this model might be too complicated, but if support cannot be offered, I would suggest tha the readme and the error messages should indicate that this model is not supported in 4 bit (or only in int4 if that's the case - didn't really try mxfp4).
π οΈ Steps to reproduce
$ python test_w4a16_no_exclusion.py --model-name ibm-granite/granite-4.0-h-small --output granite-4.0-h-small-w4a16
$ python test_gptq_no_exclusion.py --model-name ibm-granite/granite-4.0-h-small --output granite-4.0-h-small-gptq-4bit
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working