[bugfix] fix qwen3vlmoe loading #43201

JJJYmmm · 2026-01-09T17:30:46Z

What does this PR do?

Fix the loading of qwen3vlmoe experts.

test script:

from transformers import AutoProcessor, AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen3-VL-30B-A3B-Instruct", torch_dtype="auto", device_map="auto")

before:

model.language_model.layers.{0...47}.mlp.experts.down_proj    | MISMATCH | Reinit due to size mismatch ckpt: torch.Size([128, 768, 2048]) vs model:torch.Size([128, 2048, 768])  
model.language_model.layers.{0...47}.mlp.experts.gate_up_proj | MISMATCH | Reinit due to size mismatch ckpt: torch.Size([128, 2048, 1536]) vs model:torch.Size([128, 1536, 2048])

The reason is that the official checkpoint of Qwen3VLMoe follows the shape [num_experts, out_features, in_features], but in the latest code, these weights are transposed in the last two dimensions. This pr transposes them back during conversion.

The model now loads successfully after the fix. 🫡

github-actions · 2026-01-09T17:43:06Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43201&sha=a9b5dc

vasqu · 2026-01-12T13:18:49Z

Similar issue as in #43227 where reverse mapping fails cc @Cyrilvallez

IlyasMoutawwakil · 2026-01-13T07:27:09Z

@vasqu a quik fix that i found for now is to rename the tensor first and then transpose it.

JJJYmmm · 2026-01-14T05:43:40Z

I'll wait for #43227 to be merged and then apply the final fix for the transposed experts. 🫡

Cyrilvallez · 2026-01-16T13:19:01Z

Closing as superseded by #43307!

fix qwen3vlmoe loading

a9b5dc4

Cyrilvallez mentioned this pull request Jan 15, 2026

[loading] Fix Transpose Operation, and qwen3_vl_moe mapping #43307

Merged

Cyrilvallez closed this Jan 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] fix qwen3vlmoe loading #43201

[bugfix] fix qwen3vlmoe loading #43201

Uh oh!

JJJYmmm commented Jan 9, 2026

Uh oh!

github-actions bot commented Jan 9, 2026

Uh oh!

vasqu commented Jan 12, 2026

Uh oh!

IlyasMoutawwakil commented Jan 13, 2026

Uh oh!

JJJYmmm commented Jan 14, 2026

Uh oh!

Cyrilvallez commented Jan 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[bugfix] fix qwen3vlmoe loading #43201

[bugfix] fix qwen3vlmoe loading #43201

Uh oh!

Conversation

JJJYmmm commented Jan 9, 2026

What does this PR do?

Uh oh!

github-actions bot commented Jan 9, 2026

Uh oh!

vasqu commented Jan 12, 2026

Uh oh!

IlyasMoutawwakil commented Jan 13, 2026

Uh oh!

JJJYmmm commented Jan 14, 2026

Uh oh!

Cyrilvallez commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Cyrilvallez commented Jan 16, 2026 •

edited

Loading