I disovered that compiling the vulkan backend delivers much faster Performance #21530
Unanswered
inforithmics
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I discovered when I analyzed the performance of a pull request and discovered that the optimization shouldn't lead to faster code but it was faster then the precompiled Binary from Github release.
I used the same visual studio 2022 Version I used the same Vulkan SDK Version. But I discovered that the Precompiled Version has different cpu backends and the self compiled Version has only one Version.
(128kb context, q8_0 quantization, gpt-oss 20B from ggml-org) on an Radeon VII Token Generation
Cpu is AMD Ryzen 9 7940 HS
Version Self Compiled Version
47.92 t/s 56.59 t/s
This is a massive Performance difference, could it be that not the right cpu backend is loaded, or that the ggml-vulkan backend is optimized for the cpu causing such big Performance differences?
Self Compiled system_info:
system info: n_threads = 8, n_threads_batch = 8, total_threads = 16
system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | AVX512 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
Pre Compiled system_info: (Zen4 Backend loaded)
system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
Compiled with this:
cmake .. -G "Visual Studio 17 2022" -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release
Beta Was this translation helpful? Give feedback.
All reactions