-
Notifications
You must be signed in to change notification settings - Fork 620
Description
System Info
- `optimum` version: 1.27.0
- `transformers` version: 4.53.3
- Platform: Windows-10-10.0.19045-SP0
- Python version: 3.13.3
- Huggingface_hub version: 0.34.4
- PyTorch version (GPU?): 2.8.0+cpu (cuda availabe: False)
- Tensorflow version (GPU?): not installed (cuda availabe: NA)Who can help?
Hi team,
I am unable to export the Qwen/Qwen3-4B-Thinking-2507 model to ONNX using optimum-cli. The export consistently fails with a RuntimeError related to a tensor size mismatch in the cache update logic.
To Reproduce:
- Set up an environment with the latest versions of optimum[onnxruntime], transformers, and torch.
- Run the following command:
"optimum-cli export onnx --model Qwen/Qwen3-4B-Thinking-2507 qwen3-4b-thinking-onnx"
Bug Confirmation:
To confirm my environment is working correctly, I successfully exported the standard Qwen/Qwen2-1.5B-Instruct model without any issues. This strongly suggests the bug is specific to the architecture or implementation of the -Thinking variant.
Error Log for Qwen3-4B-Thinking-2507:
C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\cache_utils.py:552: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
not self.key_cache[layer_idx].numel() # prefers not t.numel() to len(t) == 0 to export the model
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Scripts\optimum-cli.exe_main.py", line 7, in
sys.exit(main())
~~~~^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\commands\optimum_cli.py", line 208, in main
service.run()
~~~~~~~~~~~^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\commands\export\onnx.py", line 276, in run
main_export(
~~~~~~~~~~~^
model_name_or_path=self.args.model,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<23 lines>...
**input_shapes,
^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx_main.py", line 418, in main_export
onnx_export_from_model(
~~~~~~~~~~~~~~~~~~~~~~^
model=model,
^^^^^^^^^^^^
...<19 lines>...
**kwargs_shapes,
^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 1186, in onnx_export_from_model
, onnx_outputs = export_models(
~~~~~~~~~~~~~^
models_and_onnx_configs=models_and_onnx_configs,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<8 lines>...
model_kwargs=model_kwargs,
^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 770, in export_models
export(
~~~~~~^
model=submodel,
^^^^^^^^^^^^^^^
...<9 lines>...
model_kwargs=model_kwargs,
^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 874, in export
export_output = export_pytorch(
model,
...<7 lines>...
model_kwargs=model_kwargs,
)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 567, in export_pytorch
onnx_export(
~~~~~~~~~~~^
model,
^^^^^^
...<6 lines>...
opset_version=opset,
^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx_init.py", line 424, in export
export(
~~~~~~^
model,
^^^^^^
...<15 lines>...
autograd_inlining=autograd_inlining,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 522, in export
_export(
~~~~~~~^
model,
^^^^^^
...<14 lines>...
autograd_inlining=autograd_inlining,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 1457, in _export
graph, params_dict, torch_out = _model_to_graph(
~~~~~~~~~~~~~~~^
model,
^^^^^^
...<8 lines>...
dynamic_axes=dynamic_axes,
^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 1080, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 964, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 871, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
~~~~~~~~~~~~~~~~~~~~~~~~~~^
model,
^^^^^^
...<3 lines>...
_return_inputs_states=True,
^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\jit_trace.py", line 1504, in _get_trace_graph
outs = ONNXTracedModule(
f, strict, _force_outplace, return_inputs, _return_inputs_states
)(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\jit_trace.py", line 138, in forward
graph, _out = torch._C._create_graph_by_tracing(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
wrapper,
^^^^^^^^
...<3 lines>...
self._force_outplace,
^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\jit_trace.py", line 129, in wrapper
outs.append(self.inner(*trace_inputs))
~~~~~~~~~~^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\model_patcher.py", line 504, in patched_forward
outputs = self.orig_forward(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\utils\generic.py", line 943, in wrapper
output = func(self, *args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 570, in forward
outputs: BaseModelOutputWithPast = self.model(
~~~~~~~~~~^
input_ids=input_ids,
^^^^^^^^^^^^^^^^^^^^
...<8 lines>...
**kwargs,
^^^^^^^^^
)
^
File "C:\Users\v\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\utils\generic.py", line 943, in wrapper
output = func(self, *args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 458, in forward
layer_outputs = decoder_layer(
hidden_states,
...<7 lines>...
**flash_attn_kwargs,
)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\modeling_layers.py", line 83, in call
return super().call(*args, **kwargs)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 262, in forward
hidden_states, self_attn_weights = self.self_attn(
~~~~~~~~~~~~~~^
hidden_states=hidden_states,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<7 lines>...
**kwargs,
^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 211, in forward
key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\cache_utils.py", line 557, in update
self.key_cache[layer_idx] = torch.cat([self.key_cache[layer_idx], key_states], dim=-2)
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 80 but got size 128 for tensor number 1 in the list.
Thank you for looking into this!
BR,
MrB
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction (minimal, reproducible, runnable)
To Reproduce:
- Set up an environment with the latest versions of optimum[onnxruntime], transformers, and torch.
- Run the following command:
"optimum-cli export onnx --model Qwen/Qwen3-4B-Thinking-2507 qwen3-4b-thinking-onnx"
Expected behavior
It gives you the Error message I send at the beginning.
Error Log for Qwen3-4B-Thinking-2507:
C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\cache_utils.py:552: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
not self.key_cache[layer_idx].numel() # prefers not t.numel() to len(t) == 0 to export the model
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Scripts\optimum-cli.exe_main.py", line 7, in
sys.exit(main())
~~~~^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\commands\optimum_cli.py", line 208, in main
service.run()
~~~~~~~~~~~^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\commands\export\onnx.py", line 276, in run
main_export(
~~~~~~~~~~~^
model_name_or_path=self.args.model,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<23 lines>...
**input_shapes,
^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx_main.py", line 418, in main_export
onnx_export_from_model(
~~~~~~~~~~~~~~~~~~~~~~^
model=model,
^^^^^^^^^^^^
...<19 lines>...
**kwargs_shapes,
^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 1186, in onnx_export_from_model
, onnx_outputs = export_models(
~~~~~~~~~~~~~^
models_and_onnx_configs=models_and_onnx_configs,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<8 lines>...
model_kwargs=model_kwargs,
^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 770, in export_models
export(
~~~~~~^
model=submodel,
^^^^^^^^^^^^^^^
...<9 lines>...
model_kwargs=model_kwargs,
^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 874, in export
export_output = export_pytorch(
model,
...<7 lines>...
model_kwargs=model_kwargs,
)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 567, in export_pytorch
onnx_export(
~~~~~~~~~~~^
model,
^^^^^^
...<6 lines>...
opset_version=opset,
^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx_init.py", line 424, in export
export(
~~~~~~^
model,
^^^^^^
...<15 lines>...
autograd_inlining=autograd_inlining,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 522, in export
_export(
~~~~~~~^
model,
^^^^^^
...<14 lines>...
autograd_inlining=autograd_inlining,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 1457, in _export
graph, params_dict, torch_out = _model_to_graph(
~~~~~~~~~~~~~~~^
model,
^^^^^^
...<8 lines>...
dynamic_axes=dynamic_axes,
^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 1080, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 964, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 871, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
~~~~~~~~~~~~~~~~~~~~~~~~~~^
model,
^^^^^^
...<3 lines>...
_return_inputs_states=True,
^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\jit_trace.py", line 1504, in _get_trace_graph
outs = ONNXTracedModule(
f, strict, _force_outplace, return_inputs, _return_inputs_states
)(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\jit_trace.py", line 138, in forward
graph, _out = torch._C._create_graph_by_tracing(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
wrapper,
^^^^^^^^
...<3 lines>...
self._force_outplace,
^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\jit_trace.py", line 129, in wrapper
outs.append(self.inner(*trace_inputs))
~~~~~~~~~~^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\model_patcher.py", line 504, in patched_forward
outputs = self.orig_forward(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\utils\generic.py", line 943, in wrapper
output = func(self, *args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 570, in forward
outputs: BaseModelOutputWithPast = self.model(
~~~~~~~~~~^
input_ids=input_ids,
^^^^^^^^^^^^^^^^^^^^
...<8 lines>...
**kwargs,
^^^^^^^^^
)
^
File "C:\Users\v\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\utils\generic.py", line 943, in wrapper
output = func(self, *args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 458, in forward
layer_outputs = decoder_layer(
hidden_states,
...<7 lines>...
**flash_attn_kwargs,
)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\modeling_layers.py", line 83, in call
return super().call(*args, **kwargs)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 262, in forward
hidden_states, self_attn_weights = self.self_attn(
~~~~~~~~~~~~~~^
hidden_states=hidden_states,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<7 lines>...
**kwargs,
^^^^^^^^^
)
^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 211, in forward
key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\cache_utils.py", line 557, in update
self.key_cache[layer_idx] = torch.cat([self.key_cache[layer_idx], key_states], dim=-2)
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 80 but got size 128 for tensor number 1 in the list.