Improve text-only decode performance by lucasnewman · Pull Request #1105 · Blaizzy/mlx-vlm

lucasnewman · 2026-05-04T04:09:37Z

Only use MRoPE when we have a vision encoder and otherwise use the standard RoPE fast-path, which matches mlx-lm. I tested a model from every model type that this touches for correctness, except for glm4v_moe, which was too large for my machine and was tested with synthetic weights.

This improves decode performance for e.g. mlx-community/Qwen3.5-0.8B-bf16 from 203 tok/sec -> 233 tok/sec in the text-only case on my M5 Max for short generations.

Improve text-only decode performance.

52ec6fe

lucasnewman requested a review from Blaizzy May 4, 2026 04:09

lucasnewman closed this May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve text-only decode performance#1105

Improve text-only decode performance#1105
lucasnewman wants to merge 1 commit into
Blaizzy:mainfrom
lucasnewman:text-only-decode-perf

lucasnewman commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lucasnewman commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant