Conversation
Bump vLLM to >=0.16.1.dev (nightly) which includes Qwen3.5 model support. This requires torch 2.10 (resolved from the existing >=2.9.0 pin), an updated flash-attn wheel built against torch 2.10, and version overrides for nvidia-cutlass-dsl and quack-kernels. Bump transformers pin to 5c1c72b which includes a rope validation fix for Qwen3.5 (huggingface/transformers#44272). Add a trainer monkey-patch for a transformers bug where Qwen3.5 passes 3D MRoPE position_ids to decoder layers instead of 2D text_position_ids, which breaks flash attention and causes NaN gradients. The upstream fix is pending: huggingface/transformers#44399 Co-Authored-By: Claude Opus 4.6 <[email protected]>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Duplicate nvidia-cutlass-dsl override renders new entry redundant
- Removed the duplicate nvidia-cutlass-dsl>=4.4.1 constraint, keeping only the >=4.4.0.dev1 entry which allows both dev versions and stable releases.
Or push these changes by commenting:
@cursor push caadd1da72
Preview (caadd1da72)
diff --git a/pyproject.toml b/pyproject.toml
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -79,7 +79,6 @@
# See: https://github.com/pytorch/pytorch/issues/166122
override-dependencies = [
"nvidia-cudnn-cu12>=9.15",
- "nvidia-cutlass-dsl>=4.4.1",
"transformers>=5.1.0.dev0",
"nvidia-cutlass-dsl>=4.4.0.dev1",
"quack-kernels>=0.2.7",There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| vocab_size = config.vocab_size | ||
| hidden_size = config.hidden_size | ||
| intermediate_size = config.intermediate_size | ||
| intermediate_size = getattr(config, "intermediate_size", getattr(config, "moe_intermediate_size", 0)) |
There was a problem hiding this comment.
Perf counter dense MLP uses wrong intermediate size for MoE
Low Severity
The fallback getattr(config, "intermediate_size", getattr(config, "moe_intermediate_size", 0)) sets intermediate_size to moe_intermediate_size for models that lack intermediate_size (like pure MoE models). This value is then used for dense_mlp_params on line 134. For MoE models with some dense layers (e.g., models with first_k_dense_replace), the dense layer intermediate size may differ from moe_intermediate_size, yielding an incorrect FLOP estimate.



Bump vLLM to >=0.16.1.dev (nightly) which includes Qwen3.5 model support. This requires torch 2.10 (resolved from the existing >=2.9.0 pin), an updated flash-attn wheel built against torch 2.10, and version overrides for nvidia-cutlass-dsl and quack-kernels.
Bump transformers pin to 5c1c72b which includes a rope validation fix for Qwen3.5 (huggingface/transformers#44272).
Add a trainer monkey-patch for a transformers bug where Qwen3.5 passes 3D MRoPE position_ids to decoder layers instead of 2D text_position_ids, which breaks flash attention and causes NaN gradients. The upstream fix is pending: huggingface/transformers#44399
Note
Cursor Bugbot is generating a summary for commit 2767dea. Configure here.