Skip to content

Conversation

@JJJYmmm
Copy link
Contributor

@JJJYmmm JJJYmmm commented Jan 9, 2026

What does this PR do?

Fix the loading of qwen3vlmoe experts.

test script:

from transformers import AutoProcessor, AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen3-VL-30B-A3B-Instruct", torch_dtype="auto", device_map="auto")

before:

model.language_model.layers.{0...47}.mlp.experts.down_proj    | MISMATCH | Reinit due to size mismatch ckpt: torch.Size([128, 768, 2048]) vs model:torch.Size([128, 2048, 768])  
model.language_model.layers.{0...47}.mlp.experts.gate_up_proj | MISMATCH | Reinit due to size mismatch ckpt: torch.Size([128, 2048, 1536]) vs model:torch.Size([128, 1536, 2048])

The reason is that the official checkpoint of Qwen3VLMoe follows the shape [num_experts, out_features, in_features], but in the latest code, these weights are transposed in the last two dimensions. This pr transposes them back during conversion.

The model now loads successfully after the fix. 🫡

@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43201&sha=a9b5dc

@vasqu
Copy link
Contributor

vasqu commented Jan 12, 2026

Similar issue as in #43227 where reverse mapping fails cc @Cyrilvallez

@IlyasMoutawwakil
Copy link
Member

@vasqu a quik fix that i found for now is to rename the tensor first and then transpose it.

@JJJYmmm
Copy link
Contributor Author

JJJYmmm commented Jan 14, 2026

I'll wait for #43227 to be merged and then apply the final fix for the transposed experts. 🫡

@Cyrilvallez
Copy link
Member

Cyrilvallez commented Jan 16, 2026

Closing as superseded by #43307!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants