feat: add Qwen3.5 support by JohannesHa · Pull Request #1946 · PrimeIntellect-ai/prime-rl

JohannesHa · 2026-03-04T03:03:43Z

Bump vLLM to >=0.16.1.dev (nightly) which includes Qwen3.5 model support. This requires torch 2.10 (resolved from the existing >=2.9.0 pin), an updated flash-attn wheel built against torch 2.10, and version overrides for nvidia-cutlass-dsl and quack-kernels.

Bump transformers pin to 5c1c72b which includes a rope validation fix for Qwen3.5 (huggingface/transformers#44272).

Add a trainer monkey-patch for a transformers bug where Qwen3.5 passes 3D MRoPE position_ids to decoder layers instead of 2D text_position_ids, which breaks flash attention and causes NaN gradients. The upstream fix is pending: huggingface/transformers#44399

Note

^{Cursor Bugbot is generating a summary for commit 2767dea. Configure here.}

Bump vLLM to >=0.16.1.dev (nightly) which includes Qwen3.5 model support. This requires torch 2.10 (resolved from the existing >=2.9.0 pin), an updated flash-attn wheel built against torch 2.10, and version overrides for nvidia-cutlass-dsl and quack-kernels. Bump transformers pin to 5c1c72b which includes a rope validation fix for Qwen3.5 (huggingface/transformers#44272). Add a trainer monkey-patch for a transformers bug where Qwen3.5 passes 3D MRoPE position_ids to decoder layers instead of 2D text_position_ids, which breaks flash attention and causes NaN gradients. The upstream fix is pending: huggingface/transformers#44399 Co-Authored-By: Claude Opus 4.6 <[email protected]>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Duplicate nvidia-cutlass-dsl override renders new entry redundant
- Removed the duplicate nvidia-cutlass-dsl>=4.4.1 constraint, keeping only the >=4.4.0.dev1 entry which allows both dev versions and stable releases.

Or push these changes by commenting:

@cursor push caadd1da72

Preview (caadd1da72)

diff --git a/pyproject.toml b/pyproject.toml
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -79,7 +79,6 @@
 # See: https://github.com/pytorch/pytorch/issues/166122
 override-dependencies = [
     "nvidia-cudnn-cu12>=9.15",
-    "nvidia-cutlass-dsl>=4.4.1",
     "transformers>=5.1.0.dev0",
     "nvidia-cutlass-dsl>=4.4.0.dev1",
     "quack-kernels>=0.2.7",

_{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.}

pyproject.toml

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

src/prime_rl/utils/vlm.py

cursor · 2026-03-04T11:40:21Z

src/prime_rl/trainer/perf.py

        vocab_size = config.vocab_size
        hidden_size = config.hidden_size
-        intermediate_size = config.intermediate_size
+        intermediate_size = getattr(config, "intermediate_size", getattr(config, "moe_intermediate_size", 0))


Perf counter dense MLP uses wrong intermediate size for MoE

Low Severity

The fallback getattr(config, "intermediate_size", getattr(config, "moe_intermediate_size", 0)) sets intermediate_size to moe_intermediate_size for models that lack intermediate_size (like pure MoE models). This value is then used for dense_mlp_params on line 134. For MoE models with some dense layers (e.g., models with first_k_dense_replace), the dense layer intermediate size may differ from moe_intermediate_size, yielding an incorrect FLOP estimate.

JohannesHa and others added 2 commits March 4, 2026 02:33

Merge remote-tracking branch 'origin/main' into feature/qwen3-5

d450ed0

cursor bot reviewed Mar 4, 2026

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

apply patch for Qwen3.5 MoE model variants

47969c8

cursor bot reviewed Mar 4, 2026

View reviewed changes

remove duplicated nvidia-cutlass-dsl override

67c7d6b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Qwen3.5 support#1946

feat: add Qwen3.5 support#1946
JohannesHa wants to merge 4 commits intomainfrom
feature/qwen3-5

JohannesHa commented Mar 4, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment •

edited

Loading

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

cursor bot Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JohannesHa commented Mar 4, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor bot Mar 4, 2026

Choose a reason for hiding this comment

Perf counter dense MLP uses wrong intermediate size for MoE

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JohannesHa commented Mar 4, 2026 •

edited by cursor bot

Loading

cursor bot left a comment •

edited

Loading