ONNX export fails for Qwen/Qwen3-4B-Thinking-2507 with RuntimeError

### System Info

```shell
- `optimum` version: 1.27.0
- `transformers` version: 4.53.3
- Platform: Windows-10-10.0.19045-SP0
- Python version: 3.13.3
- Huggingface_hub version: 0.34.4
- PyTorch version (GPU?): 2.8.0+cpu (cuda availabe: False)
- Tensorflow version (GPU?): not installed (cuda availabe: NA)
```

### Who can help?

Hi team,

I am unable to export the Qwen/Qwen3-4B-Thinking-2507 model to ONNX using optimum-cli. The export consistently fails with a RuntimeError related to a tensor size mismatch in the cache update logic.

To Reproduce:
- Set up an environment with the latest versions of optimum[onnxruntime], transformers, and torch.
- Run the following command:
"optimum-cli export onnx --model Qwen/Qwen3-4B-Thinking-2507 qwen3-4b-thinking-onnx"

Bug Confirmation:
To confirm my environment is working correctly, I successfully exported the standard Qwen/Qwen2-1.5B-Instruct model without any issues. This strongly suggests the bug is specific to the architecture or implementation of the -Thinking variant.

Error Log for Qwen3-4B-Thinking-2507:
C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\cache_utils.py:552: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  not self.key_cache[layer_idx].numel()  # prefers not t.numel() to len(t) == 0 to export the model
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Scripts\optimum-cli.exe\__main__.py", line 7, in <module>
    sys.exit(main())
             ~~~~^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\commands\optimum_cli.py", line 208, in main
    service.run()
    ~~~~~~~~~~~^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\commands\export\onnx.py", line 276, in run
    main_export(
    ~~~~~~~~~~~^
        model_name_or_path=self.args.model,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<23 lines>...
        **input_shapes,
        ^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\__main__.py", line 418, in main_export
    onnx_export_from_model(
    ~~~~~~~~~~~~~~~~~~~~~~^
        model=model,
        ^^^^^^^^^^^^
    ...<19 lines>...
        **kwargs_shapes,
        ^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 1186, in onnx_export_from_model
    _, onnx_outputs = export_models(
                      ~~~~~~~~~~~~~^
        models_and_onnx_configs=models_and_onnx_configs,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<8 lines>...
        model_kwargs=model_kwargs,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 770, in export_models
    export(
    ~~~~~~^
        model=submodel,
        ^^^^^^^^^^^^^^^
    ...<9 lines>...
        model_kwargs=model_kwargs,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 874, in export
    export_output = export_pytorch(
        model,
    ...<7 lines>...
        model_kwargs=model_kwargs,
    )
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 567, in export_pytorch
    onnx_export(
    ~~~~~~~~~~~^
        model,
        ^^^^^^
    ...<6 lines>...
        opset_version=opset,
        ^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\__init__.py", line 424, in export
    export(
    ~~~~~~^
        model,
        ^^^^^^
    ...<15 lines>...
        autograd_inlining=autograd_inlining,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 522, in export
    _export(
    ~~~~~~~^
        model,
        ^^^^^^
    ...<14 lines>...
        autograd_inlining=autograd_inlining,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 1457, in _export
    graph, params_dict, torch_out = _model_to_graph(
                                    ~~~~~~~~~~~~~~~^
        model,
        ^^^^^^
    ...<8 lines>...
        dynamic_axes=dynamic_axes,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 1080, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
                                       ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 964, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 871, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~^
        model,
        ^^^^^^
    ...<3 lines>...
        _return_inputs_states=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\jit\_trace.py", line 1504, in _get_trace_graph
    outs = ONNXTracedModule(
        f, strict, _force_outplace, return_inputs, _return_inputs_states
    )(*args, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\jit\_trace.py", line 138, in forward
    graph, _out = torch._C._create_graph_by_tracing(
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        wrapper,
        ^^^^^^^^
    ...<3 lines>...
        self._force_outplace,
        ^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\jit\_trace.py", line 129, in wrapper
    outs.append(self.inner(*trace_inputs))
                ~~~~~~~~~~^^^^^^^^^^^^^^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\model_patcher.py", line 504, in patched_forward
    outputs = self.orig_forward(*args, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\utils\generic.py", line 943, in wrapper
    output = func(self, *args, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 570, in forward
    outputs: BaseModelOutputWithPast = self.model(
                                       ~~~~~~~~~~^
        input_ids=input_ids,
        ^^^^^^^^^^^^^^^^^^^^
    ...<8 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "C:\Users\v\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\utils\generic.py", line 943, in wrapper
    output = func(self, *args, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 458, in forward
    layer_outputs = decoder_layer(
        hidden_states,
    ...<7 lines>...
        **flash_attn_kwargs,
    )
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\modeling_layers.py", line 83, in __call__
    return super().__call__(*args, **kwargs)
           ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 262, in forward
    hidden_states, self_attn_weights = self.self_attn(
                                       ~~~~~~~~~~~~~~^
        hidden_states=hidden_states,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<7 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 211, in forward
    key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
                               ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\cache_utils.py", line 557, in update
    self.key_cache[layer_idx] = torch.cat([self.key_cache[layer_idx], key_states], dim=-2)
                                ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 80 but got size 128 for tensor number 1 in the list.
  

Thank you for looking into this!
BR,
MrB

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

### Reproduction (minimal, reproducible, runnable)

To Reproduce:
- Set up an environment with the latest versions of optimum[onnxruntime], transformers, and torch.
- Run the following command:
"optimum-cli export onnx --model Qwen/Qwen3-4B-Thinking-2507 qwen3-4b-thinking-onnx"

### Expected behavior

It gives you the Error message I send at the beginning.

Error Log for Qwen3-4B-Thinking-2507:
C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\cache_utils.py:552: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  not self.key_cache[layer_idx].numel()  # prefers not t.numel() to len(t) == 0 to export the model
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Scripts\optimum-cli.exe\__main__.py", line 7, in <module>
    sys.exit(main())
             ~~~~^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\commands\optimum_cli.py", line 208, in main
    service.run()
    ~~~~~~~~~~~^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\commands\export\onnx.py", line 276, in run
    main_export(
    ~~~~~~~~~~~^
        model_name_or_path=self.args.model,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<23 lines>...
        **input_shapes,
        ^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\__main__.py", line 418, in main_export
    onnx_export_from_model(
    ~~~~~~~~~~~~~~~~~~~~~~^
        model=model,
        ^^^^^^^^^^^^
    ...<19 lines>...
        **kwargs_shapes,
        ^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 1186, in onnx_export_from_model
    _, onnx_outputs = export_models(
                      ~~~~~~~~~~~~~^
        models_and_onnx_configs=models_and_onnx_configs,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<8 lines>...
        model_kwargs=model_kwargs,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 770, in export_models
    export(
    ~~~~~~^
        model=submodel,
        ^^^^^^^^^^^^^^^
    ...<9 lines>...
        model_kwargs=model_kwargs,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 874, in export
    export_output = export_pytorch(
        model,
    ...<7 lines>...
        model_kwargs=model_kwargs,
    )
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\convert.py", line 567, in export_pytorch
    onnx_export(
    ~~~~~~~~~~~^
        model,
        ^^^^^^
    ...<6 lines>...
        opset_version=opset,
        ^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\__init__.py", line 424, in export
    export(
    ~~~~~~^
        model,
        ^^^^^^
    ...<15 lines>...
        autograd_inlining=autograd_inlining,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 522, in export
    _export(
    ~~~~~~~^
        model,
        ^^^^^^
    ...<14 lines>...
        autograd_inlining=autograd_inlining,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 1457, in _export
    graph, params_dict, torch_out = _model_to_graph(
                                    ~~~~~~~~~~~~~~~^
        model,
        ^^^^^^
    ...<8 lines>...
        dynamic_axes=dynamic_axes,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 1080, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
                                       ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 964, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\onnx\utils.py", line 871, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~^
        model,
        ^^^^^^
    ...<3 lines>...
        _return_inputs_states=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\jit\_trace.py", line 1504, in _get_trace_graph
    outs = ONNXTracedModule(
        f, strict, _force_outplace, return_inputs, _return_inputs_states
    )(*args, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\jit\_trace.py", line 138, in forward
    graph, _out = torch._C._create_graph_by_tracing(
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        wrapper,
        ^^^^^^^^
    ...<3 lines>...
        self._force_outplace,
        ^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\jit\_trace.py", line 129, in wrapper
    outs.append(self.inner(*trace_inputs))
                ~~~~~~~~~~^^^^^^^^^^^^^^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\optimum\exporters\onnx\model_patcher.py", line 504, in patched_forward
    outputs = self.orig_forward(*args, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\utils\generic.py", line 943, in wrapper
    output = func(self, *args, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 570, in forward
    outputs: BaseModelOutputWithPast = self.model(
                                       ~~~~~~~~~~^
        input_ids=input_ids,
        ^^^^^^^^^^^^^^^^^^^^
    ...<8 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "C:\Users\v\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\utils\generic.py", line 943, in wrapper
    output = func(self, *args, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 458, in forward
    layer_outputs = decoder_layer(
        hidden_states,
    ...<7 lines>...
        **flash_attn_kwargs,
    )
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\modeling_layers.py", line 83, in __call__
    return super().__call__(*args, **kwargs)
           ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 262, in forward
    hidden_states, self_attn_weights = self.self_attn(
                                       ~~~~~~~~~~~~~~^
        hidden_states=hidden_states,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<7 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\torch\nn\modules\module.py", line 1763, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\models\qwen3\modeling_qwen3.py", line 211, in forward
    key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
                               ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\<Username>\Desktop\onnx\qwen-onnx-env\Lib\site-packages\transformers\cache_utils.py", line 557, in update
    self.key_cache[layer_idx] = torch.cat([self.key_cache[layer_idx], key_states], dim=-2)
                                ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 80 but got size 128 for tensor number 1 in the list.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX export fails for Qwen/Qwen3-4B-Thinking-2507 with RuntimeError #2351

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ONNX export fails for Qwen/Qwen3-4B-Thinking-2507 with RuntimeError #2351

Description

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions