I disovered that compiling the vulkan backend delivers much faster Performance #21530

inforithmics · 2026-04-06T20:13:36Z

inforithmics
Apr 6, 2026

I discovered when I analyzed the performance of a pull request and discovered that the optimization shouldn't lead to faster code but it was faster then the precompiled Binary from Github release.
I used the same visual studio 2022 Version I used the same Vulkan SDK Version. But I discovered that the Precompiled Version has different cpu backends and the self compiled Version has only one Version.

(128kb context, q8_0 quantization, gpt-oss 20B from ggml-org) on an Radeon VII Token Generation
Cpu is AMD Ryzen 9 7940 HS

Version Self Compiled Version
47.92 t/s 56.59 t/s

Precompiled	Self Compiled
47.92 t/s	56.59 t/s

This is a massive Performance difference, could it be that not the right cpu backend is loaded, or that the ggml-vulkan backend is optimized for the cpu causing such big Performance differences?

Self Compiled system_info:

system info: n_threads = 8, n_threads_batch = 8, total_threads = 16
system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | AVX512 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |

Pre Compiled system_info: (Zen4 Backend loaded)

Compiled with this:
cmake .. -G "Visual Studio 17 2022" -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I disovered that compiling the vulkan backend delivers much faster Performance #21530

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

I disovered that compiling the vulkan backend delivers much faster Performance #21530

Uh oh!

Uh oh!

inforithmics Apr 6, 2026

Replies: 0 comments

inforithmics
Apr 6, 2026