Skip to content

Improve text-only decode performance#1105

Closed
lucasnewman wants to merge 1 commit into
Blaizzy:mainfrom
lucasnewman:text-only-decode-perf
Closed

Improve text-only decode performance#1105
lucasnewman wants to merge 1 commit into
Blaizzy:mainfrom
lucasnewman:text-only-decode-perf

Conversation

@lucasnewman
Copy link
Copy Markdown
Collaborator

Only use MRoPE when we have a vision encoder and otherwise use the standard RoPE fast-path, which matches mlx-lm. I tested a model from every model type that this touches for correctness, except for glm4v_moe, which was too large for my machine and was tested with synthetic weights.

This improves decode performance for e.g. mlx-community/Qwen3.5-0.8B-bf16 from 203 tok/sec -> 233 tok/sec in the text-only case on my M5 Max for short generations.

@lucasnewman lucasnewman requested a review from Blaizzy May 4, 2026 04:09
@lucasnewman lucasnewman closed this May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant