Skip to content

feat: add Qwen3.5 support#1946

Open
JohannesHa wants to merge 4 commits intomainfrom
feature/qwen3-5
Open

feat: add Qwen3.5 support#1946
JohannesHa wants to merge 4 commits intomainfrom
feature/qwen3-5

Conversation

@JohannesHa
Copy link
Member

@JohannesHa JohannesHa commented Mar 4, 2026

Bump vLLM to >=0.16.1.dev (nightly) which includes Qwen3.5 model support. This requires torch 2.10 (resolved from the existing >=2.9.0 pin), an updated flash-attn wheel built against torch 2.10, and version overrides for nvidia-cutlass-dsl and quack-kernels.

Bump transformers pin to 5c1c72b which includes a rope validation fix for Qwen3.5 (huggingface/transformers#44272).

Add a trainer monkey-patch for a transformers bug where Qwen3.5 passes 3D MRoPE position_ids to decoder layers instead of 2D text_position_ids, which breaks flash attention and causes NaN gradients. The upstream fix is pending: huggingface/transformers#44399


Note

Cursor Bugbot is generating a summary for commit 2767dea. Configure here.

JohannesHa and others added 2 commits March 4, 2026 02:33
Bump vLLM to >=0.16.1.dev (nightly) which includes Qwen3.5 model support.
This requires torch 2.10 (resolved from the existing >=2.9.0 pin), an
updated flash-attn wheel built against torch 2.10, and version overrides
for nvidia-cutlass-dsl and quack-kernels.

Bump transformers pin to 5c1c72b which includes a rope validation fix
for Qwen3.5 (huggingface/transformers#44272).

Add a trainer monkey-patch for a transformers bug where Qwen3.5 passes
3D MRoPE position_ids to decoder layers instead of 2D text_position_ids,
which breaks flash attention and causes NaN gradients. The upstream fix
is pending: huggingface/transformers#44399

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Duplicate nvidia-cutlass-dsl override renders new entry redundant
    • Removed the duplicate nvidia-cutlass-dsl>=4.4.1 constraint, keeping only the >=4.4.0.dev1 entry which allows both dev versions and stable releases.

Create PR

Or push these changes by commenting:

@cursor push caadd1da72
Preview (caadd1da72)
diff --git a/pyproject.toml b/pyproject.toml
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -79,7 +79,6 @@
 # See: https://github.com/pytorch/pytorch/issues/166122
 override-dependencies = [
     "nvidia-cudnn-cu12>=9.15",
-    "nvidia-cutlass-dsl>=4.4.1",
     "transformers>=5.1.0.dev0",
     "nvidia-cutlass-dsl>=4.4.0.dev1",
     "quack-kernels>=0.2.7",
This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

vocab_size = config.vocab_size
hidden_size = config.hidden_size
intermediate_size = config.intermediate_size
intermediate_size = getattr(config, "intermediate_size", getattr(config, "moe_intermediate_size", 0))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perf counter dense MLP uses wrong intermediate size for MoE

Low Severity

The fallback getattr(config, "intermediate_size", getattr(config, "moe_intermediate_size", 0)) sets intermediate_size to moe_intermediate_size for models that lack intermediate_size (like pure MoE models). This value is then used for dense_mlp_params on line 134. For MoE models with some dense layers (e.g., models with first_k_dense_replace), the dense layer intermediate size may differ from moe_intermediate_size, yielding an incorrect FLOP estimate.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant