Skip to content

Fix tiny Qwen3-VL deepstack_visual_indexes and drop the test skip#5779

Open
qgallouedec wants to merge 1 commit into
mainfrom
fix-tiny-qwen3-vl
Open

Fix tiny Qwen3-VL deepstack_visual_indexes and drop the test skip#5779
qgallouedec wants to merge 1 commit into
mainfrom
fix-tiny-qwen3-vl

Conversation

@qgallouedec
Copy link
Copy Markdown
Member

@qgallouedec qgallouedec commented May 15, 2026

Qwen3-VL has a deepstack mechanism: it taps the vision encoder at a few intermediate layer indexes and feeds those features into the language model. The indexes are stored in vision_config.deepstack_visual_indexes; for each index, a trainable Qwen3VLVisionPatchMerger is instantiated in deepstack_merger_list.

The tiny model inherited deepstack_visual_indexes=[5, 11, 17] from the full Qwen3-VL config but kept only depth=2 (vision-encoder layers 0 and 1). At forward time:

for layer_num, blk in enumerate(self.blocks):       # layer_num ∈ {0, 1}
    hidden_states = blk(...)
    if layer_num in self.deepstack_visual_indexes:  # [5, 11, 17] — never matches
        deepstack_merger_list[...](hidden_states)

So the 3 mergers were instantiated but never called → no gradient → tests had a model.visual.deepstack_merger_list skip with the vague "for some reason, these params are not updated" comment.

To fix: in scripts/generate_tiny_models/for_conditional_generation/qwen3_vl_for_conditional_generation.py, add to vision_config:

"deepstack_visual_indexes": [1],

Note

Low Risk
Low risk: config-only change to the tiny Qwen3-VL vision setup plus removal of a test-only parameter skip; main risk is CI flakiness if the updated params still don’t receive gradients in some environments.

Overview
Ensures tiny-Qwen3VLForConditionalGeneration uses a reachable vision_config.deepstack_visual_indexes (set to [1] for depth=2), so the model’s deepstack patch mergers are invoked during forward/backprop.

Updates CI to test the PR revision of trl-internal-testing/tiny-Qwen3VLForConditionalGeneration, and removes the prior DPO/SFT test skips for model.visual.deepstack_merger_list so those parameters are now expected to update during VLM training tests.

Reviewed by Cursor Bugbot for commit 62f4446. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment on lines -43 to -47
"num_hidden_layers": 2,
# Real Qwen3-VL has depth=24 with deepstack at layers [5, 11, 17]. With depth=2 here, those
# indexes are unreachable, so the deepstack mergers were instantiated but never invoked.
# Pick an index inside [0, depth).
"deepstack_visual_indexes": [1],
"hidden_size": 16,
"num_attention_heads": 4,
"num_key_value_heads": 2,
"embed_dim": 64,
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_hidden_layers, num_attention_heads, num_key_value_heads, embed_dim aren't used

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 62f4446f5b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/conftest.py
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default mode and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 62f4446. Configure here.

Comment thread tests/conftest.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants