When running Qwen3.5-35B-A3B 4bit MLX on an M4, I noticed that using a ~3000 tokens prompt:
- LM Studio gets consistently around 78 tok/s token generation and 6s TTFT
mlx_vlm.generate reaches 108 tok/s token generation and 2s TTFT
The discrepancy in token generation is almost identical even with very short prompts (<100 tokens).