Wide performance discrepancy between LM Studio MLX and `mlx_vlm.generate` with Qwen3.5-35B-A3B

When running `Qwen3.5-35B-A3B` [4bit MLX](https://huggingface.co/mlx-community/Qwen3.5-35B-A3B-4bit) on an M4, I noticed that using a ~3000 tokens prompt:

- LM Studio gets consistently around **78 tok/s** token generation and **6s TTFT**
- `mlx_vlm.generate` reaches **108 tok/s** token generation and **2s TTFT**

The discrepancy in token generation is almost identical even with very short prompts (<100 tokens).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wide performance discrepancy between LM Studio MLX and `mlx_vlm.generate` with Qwen3.5-35B-A3B #285

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Wide performance discrepancy between LM Studio MLX and mlx_vlm.generate with Qwen3.5-35B-A3B #285

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Wide performance discrepancy between LM Studio MLX and `mlx_vlm.generate` with Qwen3.5-35B-A3B #285