I’m seeing a large performance gap when running Z-Image (Q3) on Apple Metal compared to CUDA, using the same settings.
I’m new to running image generation models locally, so this may be a configuration or expectation issue.
Hardware
-
RTX 1000 Ada (6 GB, CUDA): ~2 s / iteration
-
MacBook Air M1 (16 GB, Metal): ~20 s / iteration
Example (5 Iterations)
Question
Is this level of slowdown on Metal expected?
Are there known limitations or recommended optimizations for Metal?