Fix tiny Qwen3-VL deepstack_visual_indexes and drop the test skip#5779
Fix tiny Qwen3-VL deepstack_visual_indexes and drop the test skip#5779qgallouedec wants to merge 1 commit into
deepstack_visual_indexes and drop the test skip#5779Conversation
| "num_hidden_layers": 2, | ||
| # Real Qwen3-VL has depth=24 with deepstack at layers [5, 11, 17]. With depth=2 here, those | ||
| # indexes are unreachable, so the deepstack mergers were instantiated but never invoked. | ||
| # Pick an index inside [0, depth). | ||
| "deepstack_visual_indexes": [1], | ||
| "hidden_size": 16, | ||
| "num_attention_heads": 4, | ||
| "num_key_value_heads": 2, | ||
| "embed_dim": 64, |
There was a problem hiding this comment.
num_hidden_layers, num_attention_heads, num_key_value_heads, embed_dim aren't used
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 62f4446f5b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default mode and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 62f4446. Configure here.

Qwen3-VL has a deepstack mechanism: it taps the vision encoder at a few intermediate layer indexes and feeds those features into the language model. The indexes are stored in
vision_config.deepstack_visual_indexes; for each index, a trainableQwen3VLVisionPatchMergeris instantiated indeepstack_merger_list.The tiny model inherited
deepstack_visual_indexes=[5, 11, 17]from the full Qwen3-VL config but kept onlydepth=2(vision-encoder layers0and1). At forward time:So the 3 mergers were instantiated but never called → no gradient → tests had a
model.visual.deepstack_merger_listskip with the vague "for some reason, these params are not updated" comment.To fix: in
scripts/generate_tiny_models/for_conditional_generation/qwen3_vl_for_conditional_generation.py, add tovision_config:Note
Low Risk
Low risk: config-only change to the tiny Qwen3-VL vision setup plus removal of a test-only parameter skip; main risk is CI flakiness if the updated params still don’t receive gradients in some environments.
Overview
Ensures
tiny-Qwen3VLForConditionalGenerationuses a reachablevision_config.deepstack_visual_indexes(set to[1]fordepth=2), so the model’s deepstack patch mergers are invoked during forward/backprop.Updates CI to test the PR revision of
trl-internal-testing/tiny-Qwen3VLForConditionalGeneration, and removes the prior DPO/SFT test skips formodel.visual.deepstack_merger_listso those parameters are now expected to update during VLM training tests.Reviewed by Cursor Bugbot for commit 62f4446. Bugbot is set up for automated code reviews on this repo. Configure here.