Skip to content

[Bug]: Attempts to quantize Granite 4.0-h-small to 4bits failΒ #2338

@mramendi

Description

@mramendi

βš™οΈ Your current environment

The output of python collect_env.py
### Environment Information ###
Operating System: `Linux-6.8.0-94-generic-x86_64-with-glibc2.39`
Python Version: `3.12.3 (main, Nov  6 2025, 13:44:16) [GCC 13.3.0]`
llm-compressor Version: `0.9.0.1`
compressed-tensors Version: `0.13.0`
transformers Version: `4.57.3`
torch Version: `2.9.1`
CUDA Devices: `['NVIDIA RTX PRO 6000 Blackwell Workstation Edition']`
AMD Devices: `None`
NPU Devices: `None`

πŸ› Describe the bug

Attempts to quantize Granite 4.0-h-small to 4bits fail in a variety of ways:

  • If I use simple w4a16, implemented following the FP8 example (source attached), I get:
βœ“ Quantization complete

Attempting 3D conversion...
Traceback (most recent call last):
  File "/root/test_w4a16_no_exclusion.py", line 95, in <module>
    main()
  File "/root/test_w4a16_no_exclusion.py", line 78, in main
    m.to_3d_expert()
  File "/opt/venv/datasci/lib/python3.12/site-packages/llmcompressor/modeling/granite4.py", line 40, in to_3d_expert
    self.weight.shape == torch.Size((dim0_mul, self.input_size))
AssertionError: Shape mismatch, please check.

test_w4a16_no_exclusion.py

If I try GPTQ (source also attached), I get:

Traceback (most recent call last):
  File "/root/test_gptq_no_exclusion.py", line 156, in <module>
    main()
  File "/root/test_gptq_no_exclusion.py", line 142, in main
    oneshot(
  File "/opt/venv/datasci/lib/python3.12/site-packages/llmcompressor/entrypoints/oneshot.py", line 357, in oneshot
    one_shot()
  File "/opt/venv/datasci/lib/python3.12/site-packages/llmcompressor/entrypoints/oneshot.py", line 172, in __call__
    self.apply_recipe_modifiers(
  File "/opt/venv/datasci/lib/python3.12/site-packages/llmcompressor/entrypoints/oneshot.py", line 222, in apply_recipe_modifiers
    pipeline(
  File "/opt/venv/datasci/lib/python3.12/site-packages/llmcompressor/pipelines/independent/pipeline.py", line 45, in __call__
    pipeline(model, dataloader, dataset_args)
  File "/opt/venv/datasci/lib/python3.12/site-packages/llmcompressor/pipelines/sequential/pipeline.py", line 73, in __call__
    subgraphs = trace_subgraphs(model, sample_input, sequential_targets, ignore)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/datasci/lib/python3.12/site-packages/llmcompressor/pipelines/sequential/helpers.py", line 135, in trace_subgraphs
    tracer.trace(
  File "/opt/venv/datasci/lib/python3.12/site-packages/llmcompressor/pipelines/sequential/transformers_helpers.py", line 1485, in trace
    self.graph.erase_node(user)
  File "/opt/venv/datasci/lib/python3.12/site-packages/torch/fx/graph.py", line 1257, in erase_node
    raise RuntimeError(
RuntimeError: Tried to erase Node getitem_169 but it still had 1 users in the graph: {output: None}!

test_gptq_no_exclusion.py

I understand supporting 4bit for this model might be too complicated, but if support cannot be offered, I would suggest tha the readme and the error messages should indicate that this model is not supported in 4 bit (or only in int4 if that's the case - didn't really try mxfp4).

πŸ› οΈ Steps to reproduce

$ python test_w4a16_no_exclusion.py --model-name ibm-granite/granite-4.0-h-small --output granite-4.0-h-small-w4a16
$ python test_gptq_no_exclusion.py --model-name ibm-granite/granite-4.0-h-small --output granite-4.0-h-small-gptq-4bit

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions