Skip to content

ONNX export fails for Qwen/Qwen3-4B-Thinking-2507 with RuntimeErrorΒ #2351

@MrBobertus

Description

@MrBobertus

System Info

- `optimum` version: 1.27.0
- `transformers` version: 4.53.3
- Platform: Windows-10-10.0.19045-SP0
- Python version: 3.13.3
- Huggingface_hub version: 0.34.4
- PyTorch version (GPU?): 2.8.0+cpu (cuda availabe: False)
- Tensorflow version (GPU?): not installed (cuda availabe: NA)

Who can help?

Hi team,

I am unable to export the Qwen/Qwen3-4B-Thinking-2507 model to ONNX using optimum-cli. The export consistently fails with a RuntimeError related to a tensor size mismatch in the cache update logic.

To Reproduce:

  • Set up an environment with the latest versions of optimum[onnxruntime], transformers, and torch.
  • Run the following command:
    "optimum-cli export onnx --model Qwen/Qwen3-4B-Thinking-2507 qwen3-4b-thinking-onnx"

Bug Confirmation:
To confirm my environment is working correctly, I successfully exported the standard Qwen/Qwen2-1.5B-Instruct model without any issues. This strongly suggests the bug is specific to the architecture or implementation of the -Thinking variant.

Error Log for Qwen3-4B-Thinking-2507:
C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\cache_utils.py:552: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
not self.key_cache[layer_idx].numel() # prefers not t.numel() to len(t) == 0 to export the model
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Scripts\optimum-cli.exe_main
.py", line 7, in
sys.exit(main())
~~~~^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\commands\optimum_cli.py", line 208, in main
service.run()
~~~~~~~~~~~^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\commands\export\onnx.py", line 276, in run
main_export(
~~~~~~~~~~~^
model_name_or_path=self.args.model,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<23 lines>...
**input_shapes,
^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx_main
.py", line 418, in main_export
onnx_export_from_model(
~~~~~~~~~~~~~~~~~~~~~~^
model=model,
^^^^^^^^^^^^
...<19 lines>...
**kwargs_shapes,
^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 1186, in onnx_export_from_model
, onnx_outputs = export_models(
~~~~~~~~~~~~~^
models_and_onnx_configs=models_and_onnx_configs,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<8 lines>...
model_kwargs=model_kwargs,
^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 770, in export_models
export(
~~~~~~^
model=submodel,
^^^^^^^^^^^^^^^
...<9 lines>...
model_kwargs=model_kwargs,
^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 874, in export
export_output = export_pytorch(
model,
...<7 lines>...
model_kwargs=model_kwargs,
)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 567, in export_pytorch
onnx_export(
~~~~~~~~~~~^
model,
^^^^^^
...<6 lines>...
opset_version=opset,
^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx_init
.py", line 424, in export
export(
~~~~~~^
model,
^^^^^^
...<15 lines>...
autograd_inlining=autograd_inlining,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 522, in export
_export(
~~~~~~~^
model,
^^^^^^
...<14 lines>...
autograd_inlining=autograd_inlining,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 1457, in _export
graph, params_dict, torch_out = _model_to_graph(
~~~~~~~~~~~~~~~^
model,
^^^^^^
...<8 lines>...
dynamic_axes=dynamic_axes,
^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 1080, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 964, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 871, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
~~~~~~~~~~~~~~~~~~~~~~~~~~^
model,
^^^^^^
...<3 lines>...
_return_inputs_states=True,
^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\jit_trace.py", line 1504, in _get_trace_graph
outs = ONNXTracedModule(
f, strict, _force_outplace, return_inputs, _return_inputs_states
)(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\jit_trace.py", line 138, in forward
graph, _out = torch._C._create_graph_by_tracing(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
wrapper,
^^^^^^^^
...<3 lines>...
self._force_outplace,
^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\jit_trace.py", line 129, in wrapper
outs.append(self.inner(*trace_inputs))
~~~~~~~~~~^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\model_patcher.py", line 504, in patched_forward
outputs = self.orig_forward(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\utils\generic.py", line 943, in wrapper
output = func(self, *args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 570, in forward
outputs: BaseModelOutputWithPast = self.model(
~~~~~~~~~~^
input_ids=input_ids,
^^^^^^^^^^^^^^^^^^^^
...<8 lines>...
**kwargs,
^^^^^^^^^
)
^
File "C:\Users\v\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\utils\generic.py", line 943, in wrapper
output = func(self, *args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 458, in forward
layer_outputs = decoder_layer(
hidden_states,
...<7 lines>...
**flash_attn_kwargs,
)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\modeling_layers.py", line 83, in call
return super().call(*args, **kwargs)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 262, in forward
hidden_states, self_attn_weights = self.self_attn(
~~~~~~~~~~~~~~^
hidden_states=hidden_states,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<7 lines>...
**kwargs,
^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 211, in forward
key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\cache_utils.py", line 557, in update
self.key_cache[layer_idx] = torch.cat([self.key_cache[layer_idx], key_states], dim=-2)
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 80 but got size 128 for tensor number 1 in the list.

Thank you for looking into this!
BR,
MrB

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

To Reproduce:

  • Set up an environment with the latest versions of optimum[onnxruntime], transformers, and torch.
  • Run the following command:
    "optimum-cli export onnx --model Qwen/Qwen3-4B-Thinking-2507 qwen3-4b-thinking-onnx"

Expected behavior

It gives you the Error message I send at the beginning.

Error Log for Qwen3-4B-Thinking-2507:
C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\cache_utils.py:552: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
not self.key_cache[layer_idx].numel() # prefers not t.numel() to len(t) == 0 to export the model
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Scripts\optimum-cli.exe_main
.py", line 7, in
sys.exit(main())
~~~~^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\commands\optimum_cli.py", line 208, in main
service.run()
~~~~~~~~~~~^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\commands\export\onnx.py", line 276, in run
main_export(
~~~~~~~~~~~^
model_name_or_path=self.args.model,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<23 lines>...
**input_shapes,
^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx_main
.py", line 418, in main_export
onnx_export_from_model(
~~~~~~~~~~~~~~~~~~~~~~^
model=model,
^^^^^^^^^^^^
...<19 lines>...
**kwargs_shapes,
^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 1186, in onnx_export_from_model
, onnx_outputs = export_models(
~~~~~~~~~~~~~^
models_and_onnx_configs=models_and_onnx_configs,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<8 lines>...
model_kwargs=model_kwargs,
^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 770, in export_models
export(
~~~~~~^
model=submodel,
^^^^^^^^^^^^^^^
...<9 lines>...
model_kwargs=model_kwargs,
^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 874, in export
export_output = export_pytorch(
model,
...<7 lines>...
model_kwargs=model_kwargs,
)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 567, in export_pytorch
onnx_export(
~~~~~~~~~~~^
model,
^^^^^^
...<6 lines>...
opset_version=opset,
^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx_init
.py", line 424, in export
export(
~~~~~~^
model,
^^^^^^
...<15 lines>...
autograd_inlining=autograd_inlining,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 522, in export
_export(
~~~~~~~^
model,
^^^^^^
...<14 lines>...
autograd_inlining=autograd_inlining,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 1457, in _export
graph, params_dict, torch_out = _model_to_graph(
~~~~~~~~~~~~~~~^
model,
^^^^^^
...<8 lines>...
dynamic_axes=dynamic_axes,
^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 1080, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 964, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 871, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
~~~~~~~~~~~~~~~~~~~~~~~~~~^
model,
^^^^^^
...<3 lines>...
_return_inputs_states=True,
^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\jit_trace.py", line 1504, in _get_trace_graph
outs = ONNXTracedModule(
f, strict, _force_outplace, return_inputs, _return_inputs_states
)(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\jit_trace.py", line 138, in forward
graph, _out = torch._C._create_graph_by_tracing(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
wrapper,
^^^^^^^^
...<3 lines>...
self._force_outplace,
^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\jit_trace.py", line 129, in wrapper
outs.append(self.inner(*trace_inputs))
~~~~~~~~~~^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\model_patcher.py", line 504, in patched_forward
outputs = self.orig_forward(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\utils\generic.py", line 943, in wrapper
output = func(self, *args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 570, in forward
outputs: BaseModelOutputWithPast = self.model(
~~~~~~~~~~^
input_ids=input_ids,
^^^^^^^^^^^^^^^^^^^^
...<8 lines>...
**kwargs,
^^^^^^^^^
)
^
File "C:\Users\v\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\utils\generic.py", line 943, in wrapper
output = func(self, *args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 458, in forward
layer_outputs = decoder_layer(
hidden_states,
...<7 lines>...
**flash_attn_kwargs,
)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\modeling_layers.py", line 83, in call
return super().call(*args, **kwargs)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 262, in forward
hidden_states, self_attn_weights = self.self_attn(
~~~~~~~~~~~~~~^
hidden_states=hidden_states,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<7 lines>...
**kwargs,
^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 211, in forward
key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\cache_utils.py", line 557, in update
self.key_cache[layer_idx] = torch.cat([self.key_cache[layer_idx], key_states], dim=-2)
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 80 but got size 128 for tensor number 1 in the list.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions