Git commit
07ac3ce Merge pull request #25 from chimpera/fix-argmax-topk-64
Operating systems
Mac
GGML backends
Metal
Problem description & steps to reproduce
On macOS (-DGGML_METAL=ON), the embedded Metal library fails to compile at startup because quantize_turbo4_0 and
turbo4_dequantize_full_block in ggml/src/ggml-metal/ggml-metal.metal reference dst.signs / xb->rnorm, but block_turbo4_0 in
ggml/src/ggml-common.h:335-339 only defines norm and qs. The Metal kernels look like turbo3 code copy-pasted under the turbo4 name — the CUDA path
in ggml/src/ggml-cuda/turbo-quant-cuda.cuh:498 implements the correct 4-bit-only layout. Because the whole Metal library fails to build, GPU offload
is broken for every model and cache type on macOS, not just turbo4.
First Bad Commit
No response
Compile command
cmake -B build -DGGML_METAL=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j
./build/bin/llama-server --version
Relevant log output
ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3
program_source:3714:49: error: no member named 'signs' in 'block_turbo4_0'
for (int j = 0; j < QK_TURBO4 / 8; j++) dst.signs[j] = 0;
program_source:3746:9: error: no member named 'rnorm' in 'block_turbo4_0'; did you mean 'norm'?
dst.rnorm = half(sqrt(rnorm_sq));
program_source:3752:17: error: no member named 'signs' in 'block_turbo4_0'
program_source:3859:35: error: no member named 'rnorm' in 'block_turbo4_0'
program_source:3880:27: error: no member named 'signs' in 'block_turbo4_0'
program_source:12896:50: error: no member named 'signs' in 'block_turbo4_0'
... in instantiation of kernel_set_rows_turbo<int64_t, block_turbo4_0, ...>
ggml_metal_device_init: error: failed to create library
Git commit
07ac3ce Merge pull request #25 from chimpera/fix-argmax-topk-64
Operating systems
Mac
GGML backends
Metal
Problem description & steps to reproduce
On macOS (-DGGML_METAL=ON), the embedded Metal library fails to compile at startup because quantize_turbo4_0 and
turbo4_dequantize_full_block in ggml/src/ggml-metal/ggml-metal.metal reference dst.signs / xb->rnorm, but block_turbo4_0 in
ggml/src/ggml-common.h:335-339 only defines norm and qs. The Metal kernels look like turbo3 code copy-pasted under the turbo4 name — the CUDA path
in ggml/src/ggml-cuda/turbo-quant-cuda.cuh:498 implements the correct 4-bit-only layout. Because the whole Metal library fails to build, GPU offload
is broken for every model and cache type on macOS, not just turbo4.
First Bad Commit
No response
Compile command
cmake -B build -DGGML_METAL=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j ./build/bin/llama-server --versionRelevant log output