-
Notifications
You must be signed in to change notification settings - Fork 30
Open
Description
Thanks for this library! It is exactly what I was looking for.
I've been trying to get Dia working with Metal. I have it working normally (albeit very very slow). But when I --use-metal it is crashing. I ran a debug using these steps:
cmake -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build buildThen running lldb with backtrace:
❯ lldb -- build/bin/tts-cli --model-path ~/Downloads/Dia_Q4.gguf --prompt "[S1] Hi, I am Dia. All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood." --save-path ~/Downloads/test2.wav --topk 35 --temperature 1.3 --use-metal
(lldb) target create "build/bin/tts-cli"
Current executable set to '/Users/njero/Code/examples/TTS.cpp/build/bin/tts-cli' (arm64).
(lldb) settings set -- target.run-args "--model-path" "/Users/njero/Downloads/Dia_Q4.gguf" "--prompt" "[S1] Hi, I am Dia. All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood." "--save-path" "/Users/njero/Downloads/test2.wav" "--topk" "35" "--temperature" "1.3" "--use-metal"
(lldb) run
Process 33871 launched: '/Users/njero/Code/examples/TTS.cpp/build/bin/tts-cli' (arm64)
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2
ggml_metal_init: picking default device: Apple M2
ggml_metal_init: using embedded metal library
2025-09-21 22:47:16.602905-0400 tts-cli[33871:15248153] [Metal Compiler Warning] Warning: Compilation succeeded with:
program_source:480:28: warning: unused variable 'ksigns64' [-Wunused-const-variable]
GGML_TABLE_BEGIN(uint64_t, ksigns64, 128)
^
ggml_metal_init: GPU name: Apple M2
ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_init: simdgroup reduction = true
ggml_metal_init: simdgroup matrix mul. = true
ggml_metal_init: has bfloat = true
ggml_metal_init: use bfloat = false
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 11453.25 MB
... Many more metal init lines ...
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2
ggml_metal_init: picking default device: Apple M2
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name: Apple M2
ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_init: simdgroup reduction = true
ggml_metal_init: simdgroup matrix mul. = true
ggml_metal_init: has bfloat = true
ggml_metal_init: use bfloat = false
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 11453.25 MB
ggml_metal_init: loaded kernel_add 0x134685760 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_add_row 0x1346859c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sub 0x134685c20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sub_row 0x134685e80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul 0x1346860e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_row 0x134686340 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_div 0x1346865a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_div_row 0x134686800 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_f32 0x134686a60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_f16 0x134686ce0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_i32 0x134686f40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_i16 0x1346871a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_scale 0x134687710 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_scale_4 0x134687c80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_clamp 0x134688250 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_tanh 0x134688730 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_relu 0x134688c10 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sigmoid 0x1346890f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu 0x1346895d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu_4 0x134689d40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu_quick 0x13468a220 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu_quick_4 0x13468a700 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_silu 0x114605910 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_silu_4 0x114605df0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_elu 0x1146065d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_soft_max_f16 0x114606b30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_soft_max_f16_4 0x114607090 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_soft_max_f32 0x1146075f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_soft_max_f32_4 0x114607b50 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_diag_mask_inf 0x1146080b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_diag_mask_inf_8 0x114608610 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_f32 0x114608b70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_f16 0x1146090d0 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_get_rows_bf16 (not supported)
ggml_metal_init: loaded kernel_get_rows_q4_0 0x114609630 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q4_1 0x114609b90 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q5_0 0x11460a0f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q5_1 0x11460a650 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q8_0 0x11460abb0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q2_K 0x11460b110 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q3_K 0x11460b670 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q4_K 0x11460bbd0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q5_K 0x11460c130 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q6_K 0x11460c690 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq2_xxs 0x11460cbf0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq2_xs 0x11460d150 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq3_xxs 0x11460d6b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq3_s 0x11460dc10 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq2_s 0x11460e170 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq1_s 0x11460e6d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq1_m 0x11460ec30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq4_nl 0x11460f190 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq4_xs 0x11460f6f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_i32 0x11460fc50 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rms_norm 0x1146101b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_group_norm 0x114610710 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_norm 0x114610c70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_ssm_conv_f32 0x1146111d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_ssm_scan_f32 0x114611730 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f32_f32 0x114611ce0 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_mul_mv_bf16_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16 (not supported)
ggml_metal_init: loaded kernel_mul_mv_f16_f32 0x114612290 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f16_f32_1row 0x114612840 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f16_f32_l4 0x114612df0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f16_f16 0x1146133a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q4_0_f32 0x114613950 | th_max = 640 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q4_1_f32 0x114613f00 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q5_0_f32 0x1146144b0 | th_max = 640 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q5_1_f32 0x114614a60 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q8_0_f32 0x114615010 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q2_K_f32 0x1146155c0 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q3_K_f32 0x114615b70 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q4_K_f32 0x114616120 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q5_K_f32 0x1146166d0 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q6_K_f32 0x114616c80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq2_xxs_f32 0x114617230 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq2_xs_f32 0x1146177e0 | th_max = 640 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq3_xxs_f32 0x114617d90 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq3_s_f32 0x114618340 | th_max = 640 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq2_s_f32 0x1146188f0 | th_max = 640 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq1_s_f32 0x114618ea0 | th_max = 448 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq1_m_f32 0x114619450 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq4_nl_f32 0x114619a00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq4_xs_f32 0x114619fb0 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_f32_f32 0x11461a560 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_f16_f32 0x11461ab10 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32 (not supported)
ggml_metal_init: loaded kernel_mul_mv_id_q4_0_f32 0x11461b0c0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q4_1_f32 0x11461b670 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q5_0_f32 0x11461bc20 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q5_1_f32 0x11461c1d0 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q8_0_f32 0x11461c780 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q2_K_f32 0x11461cd30 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q3_K_f32 0x11461d2e0 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q4_K_f32 0x11461d890 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q5_K_f32 0x11461de40 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q6_K_f32 0x11461e3f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq2_xxs_f32 0x11461e9a0 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq2_xs_f32 0x11461ef50 | th_max = 640 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq3_xxs_f32 0x11461f500 | th_max = 704 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq3_s_f32 0x11461fab0 | th_max = 640 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq2_s_f32 0x114620060 | th_max = 640 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq1_s_f32 0x114620610 | th_max = 448 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq1_m_f32 0x114620bc0 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq4_nl_f32 0x114621170 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq4_xs_f32 0x114621720 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_f32_f32 0x114621cd0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_f16_f32 0x114622280 | th_max = 832 | th_width = 32
ggml_metal_init: skipping kernel_mul_mm_bf16_f32 (not supported)
ggml_metal_init: loaded kernel_mul_mm_q4_0_f32 0x114622830 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q4_1_f32 0x114622de0 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q5_0_f32 0x114623390 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q5_1_f32 0x114623940 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q8_0_f32 0x114623ef0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q2_K_f32 0x1146244a0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q3_K_f32 0x114624a50 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q4_K_f32 0x114625000 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q5_K_f32 0x1146255b0 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q6_K_f32 0x114625b60 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq2_xxs_f32 0x114626110 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq2_xs_f32 0x1146266c0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq3_xxs_f32 0x114626c70 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq3_s_f32 0x114627220 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq2_s_f32 0x1146277d0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq1_s_f32 0x114627d80 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq1_m_f32 0x114628330 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq4_nl_f32 0x1146288e0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq4_xs_f32 0x114628e90 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_f32_f32 0x114629440 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_f16_f32 0x1146299f0 | th_max = 832 | th_width = 32
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f32 (not supported)
ggml_metal_init: loaded kernel_mul_mm_id_q4_0_f32 0x114629fa0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q4_1_f32 0x11462a550 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q5_0_f32 0x11462ab00 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q5_1_f32 0x11462b0b0 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q8_0_f32 0x11462b660 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q2_K_f32 0x11462bc10 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q3_K_f32 0x11462c1c0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q4_K_f32 0x11462c770 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q5_K_f32 0x11462cd20 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q6_K_f32 0x11462d2d0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq2_xxs_f32 0x11462d880 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq2_xs_f32 0x11462de30 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq3_xxs_f32 0x11462e3e0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq3_s_f32 0x11462e990 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq2_s_f32 0x11462ef40 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq1_s_f32 0x11462f4f0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq1_m_f32 0x11462faa0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq4_nl_f32 0x114630050 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq4_xs_f32 0x114630600 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_rope_norm_f32 0x114630c60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rope_norm_f16 0x1146312c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rope_neox_f32 0x114631920 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rope_neox_f16 0x114631f80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_conv_transpose_1d_f32 0x114632530 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_conv_transpose_1d_f16 0x114632ae0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_im2col_f16 0x114633040 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_im2col_f32 0x1146335a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_im2col_ext_f16 0x114633b00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_im2col_ext_f32 0x114634060 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_upscale_f32 0x1146345c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_pad_f32 0x114634b20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_timestep_embedding_f32 0x114635080 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_arange_f32 0x1146355e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_argsort_f32_i32_asc 0x114635b40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_argsort_f32_i32_desc 0x134689830 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_leaky_relu_f32 0x13468aef0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h64 0x13468b150 | th_max = 640 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h80 0x13468b3b0 | th_max = 640 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h96 0x13468b610 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h112 0x13468b870 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h128 0x13468bad0 | th_max = 512 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h256 0x13468bd30 | th_max = 512 | th_width = 32
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h96 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h112 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h256 (not supported)
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h64 0x13468bf90 | th_max = 704 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h80 0x13468c1f0 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h96 0x13468c450 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h112 0x13468c860 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h128 0x13468cac0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h256 0x13468cd20 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h64 0x13468cf80 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h80 0x13468d1e0 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h96 0x13468d440 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h112 0x13468d6a0 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h128 0x13468d900 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h256 0x13468db60 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h64 0x13468df70 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h80 0x13468e1d0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h96 0x13468e430 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h112 0x13468e690 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h128 0x13468e8f0 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h256 0x13468eb50 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h64 0x13468edb0 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h80 0x13468f010 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h96 0x13468f270 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h112 0x13468f680 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h128 0x13468f8e0 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h256 0x13468fb40 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h64 0x13468fda0 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h80 0x134690000 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h96 0x134690260 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h112 0x1346904c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h128 0x134690720 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h256 0x134690980 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_f16_h128 0x134690d90 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h128 (not supported)
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_0_h128 0x134690ff0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_1_h128 0x134691250 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_0_h128 0x1346914b0 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_1_h128 0x134691710 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q8_0_h128 0x134691970 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_f16_h256 0x134691bd0 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h256 (not supported)
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_0_h256 0x134691e30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_1_h256 0x134692090 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_0_h256 0x1346924a0 | th_max = 704 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_1_h256 0x134692700 | th_max = 704 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q8_0_h256 0x134692960 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_cpy_i32_i32 0x134692c50 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_f32 0x134692eb0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_f16 0x134693110 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_cpy_f32_bf16 (not supported)
ggml_metal_init: loaded kernel_cpy_f16_f32 0x134693370 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f16_f16 0x1346935d0 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_cpy_bf16_f32 (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_bf16 (not supported)
ggml_metal_init: loaded kernel_cpy_f32_q8_0 0x134693830 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q4_0 0x134693a90 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q4_1 0x134693cf0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q5_0 0x134693f50 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q5_1 0x1346941b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_iq4_nl 0x134694410 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_concat 0x134694670 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sqr 0x134694b80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sqrt 0x134695060 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sin 0x134695540 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_reciprocal 0x134695a20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cos 0x134695f00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sum_rows 0x134696160 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_pool_2d_avg_f32 0x1346963c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_pool_2d_max_f32 0x134696620 | th_max = 1024 | th_width = 32
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2
ggml_metal_init: picking default device: Apple M2
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name: Apple M2
ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_init: simdgroup reduction = true
ggml_metal_init: simdgroup matrix mul. = true
ggml_metal_init: has bfloat = true
ggml_metal_init: use bfloat = false
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 11453.25 MB
ggml_metal_init: loaded kernel_add 0x124608760 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_add_row 0x1246089c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sub 0x124608c20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sub_row 0x13462d260 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul 0x134696880 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_row 0x134696ae0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_div 0x134696d40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_div_row 0x134696fa0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_f32 0x1346972c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_f16 0x134697520 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_i32 0x134697780 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_repeat_i16 0x1346979e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_scale 0x114635da0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_scale_4 0x114636310 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_clamp 0x1146368e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_tanh 0x114636dc0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_relu 0x1146372a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sigmoid 0x114637780 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu 0x114637c60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu_4 0x1146383d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu_quick 0x1146388b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_gelu_quick_4 0x114638d90 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_silu 0x114639270 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_silu_4 0x114639750 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_elu 0x114639c30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_soft_max_f16 0x114639e90 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_soft_max_f16_4 0x11463a0f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_soft_max_f32 0x11463a350 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_soft_max_f32_4 0x13471e7b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_diag_mask_inf 0x124608f00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_diag_mask_inf_8 0x124609160 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_f32 0x1246094e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_f16 0x124609a40 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_get_rows_bf16 (not supported)
ggml_metal_init: loaded kernel_get_rows_q4_0 0x114637ec0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q4_1 0x11463a5b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q5_0 0x11463a810 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q5_1 0x11463aa70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q8_0 0x11463acd0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q2_K 0x11463af30 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q3_K 0x11463b190 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q4_K 0x11463b3f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q5_K 0x11463b650 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_q6_K 0x11463b8b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq2_xxs 0x11463bb10 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq2_xs 0x13470fbb0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq3_xxs 0x134724d20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq3_s 0x134720710 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq2_s 0x13470aba0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq1_s 0x134708840 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq1_m 0x1347211c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq4_nl 0x134721420 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_iq4_xs 0x134728e00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_get_rows_i32 0x134697c40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rms_norm 0x134729060 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_group_norm 0x1347292c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_norm 0x134729520 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_ssm_conv_f32 0x134729780 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_ssm_scan_f32 0x1347299e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f32_f32 0x134729c40 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_mul_mv_bf16_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16 (not supported)
ggml_metal_init: loaded kernel_mul_mv_f16_f32 0x134729ea0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f16_f32_1row 0x13472a100 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f16_f32_l4 0x13472a360 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_f16_f16 0x13472a5c0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q4_0_f32 0x13472a820 | th_max = 640 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q4_1_f32 0x124609dc0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q5_0_f32 0x12460a4f0 | th_max = 640 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q5_1_f32 0x12460aaa0 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q8_0_f32 0x11463bd70 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q2_K_f32 0x11463bfd0 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q3_K_f32 0x11463c230 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q4_K_f32 0x11463c490 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q5_K_f32 0x11463c6f0 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_q6_K_f32 0x11463c950 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq2_xxs_f32 0x11463cbb0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq2_xs_f32 0x11463ce10 | th_max = 640 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq3_xxs_f32 0x11463d070 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq3_s_f32 0x11463d2d0 | th_max = 640 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq2_s_f32 0x11463d530 | th_max = 640 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq1_s_f32 0x11463d790 | th_max = 448 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq1_m_f32 0x11463d9f0 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq4_nl_f32 0x13472aa80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_iq4_xs_f32 0x13472ace0 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_f32_f32 0x13472af40 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_f16_f32 0x13472b1a0 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32 (not supported)
ggml_metal_init: loaded kernel_mul_mv_id_q4_0_f32 0x13472b400 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q4_1_f32 0x13472b660 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q5_0_f32 0x13472b8c0 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q5_1_f32 0x13472bb20 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q8_0_f32 0x13472bd80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q2_K_f32 0x13472bfe0 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q3_K_f32 0x13472c240 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q4_K_f32 0x11463dc50 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q5_K_f32 0x11463deb0 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_q6_K_f32 0x12460b050 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq2_xxs_f32 0x12460b600 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq2_xs_f32 0x12460bbb0 | th_max = 640 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq3_xxs_f32 0x12460c160 | th_max = 704 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq3_s_f32 0x12460c5a0 | th_max = 640 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq2_s_f32 0x12460ccd0 | th_max = 640 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq1_s_f32 0x13472c4a0 | th_max = 448 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq1_m_f32 0x13472c700 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq4_nl_f32 0x13472c960 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mv_id_iq4_xs_f32 0x11463e110 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_f32_f32 0x11463e370 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_f16_f32 0x12460d110 | th_max = 832 | th_width = 32
ggml_metal_init: skipping kernel_mul_mm_bf16_f32 (not supported)
ggml_metal_init: loaded kernel_mul_mm_q4_0_f32 0x12460d840 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q4_1_f32 0x12460ddf0 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q5_0_f32 0x12460e3a0 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q5_1_f32 0x12460e950 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q8_0_f32 0x12460ef00 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q2_K_f32 0x134697ea0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q3_K_f32 0x134698100 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q4_K_f32 0x11463e5d0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q5_K_f32 0x11463e830 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_q6_K_f32 0x11463ea90 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq2_xxs_f32 0x11463ecf0 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq2_xs_f32 0x12460f340 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq3_xxs_f32 0x12460fa70 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq3_s_f32 0x124610020 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq2_s_f32 0x13472cbc0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq1_s_f32 0x13472ce20 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq1_m_f32 0x13472d080 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq4_nl_f32 0x13472d2e0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_iq4_xs_f32 0x13472d540 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_f32_f32 0x134698360 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_f16_f32 0x1346985c0 | th_max = 832 | th_width = 32
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f32 (not supported)
ggml_metal_init: loaded kernel_mul_mm_id_q4_0_f32 0x134698820 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q4_1_f32 0x134698a80 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q5_0_f32 0x134698ce0 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q5_1_f32 0x134698f40 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q8_0_f32 0x1346991a0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q2_K_f32 0x11463ef50 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q3_K_f32 0x11463f1b0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q4_K_f32 0x134699400 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q5_K_f32 0x134699660 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_q6_K_f32 0x1346998c0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq2_xxs_f32 0x134699b20 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq2_xs_f32 0x134699d80 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq3_xxs_f32 0x134699fe0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq3_s_f32 0x13472d7a0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq2_s_f32 0x13469a240 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq1_s_f32 0x13469a4a0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq1_m_f32 0x13469a700 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq4_nl_f32 0x13469a960 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_mul_mm_id_iq4_xs_f32 0x13469abc0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_rope_norm_f32 0x13469ae20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rope_norm_f16 0x13469b080 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rope_neox_f32 0x13469b2e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_rope_neox_f16 0x13469b540 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_conv_transpose_1d_f32 0x13469b860 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_conv_transpose_1d_f16 0x13469bac0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_im2col_f16 0x13469bd20 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_im2col_f32 0x13469bf80 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_im2col_ext_f16 0x13469c1e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_im2col_ext_f32 0x13469c440 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_upscale_f32 0x13469c6a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_pad_f32 0x13469c900 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_timestep_embedding_f32 0x13469cb60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_arange_f32 0x13469cdc0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_argsort_f32_i32_asc 0x13469d020 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_argsort_f32_i32_desc 0x13469d280 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_leaky_relu_f32 0x13469d7f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h64 0x13469da50 | th_max = 640 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h80 0x13469dcb0 | th_max = 640 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h96 0x13469df10 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h112 0x13469e170 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h128 0x13469e3d0 | th_max = 512 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_f16_h256 0x13469e630 | th_max = 512 | th_width = 32
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h96 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h112 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h256 (not supported)
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h64 0x13469e890 | th_max = 704 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h80 0x13469eaf0 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h96 0x13469ed50 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h112 0x13469f160 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h128 0x13469f3c0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_0_h256 0x13469f620 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h64 0x13469f880 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h80 0x13469fae0 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h96 0x13469fd40 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h112 0x13469ffa0 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h128 0x1346a0200 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q4_1_h256 0x1346a0460 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h64 0x1346a0870 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h80 0x1346a0ad0 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h96 0x1346a0d30 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h112 0x1346a0f90 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h128 0x1346a11f0 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_0_h256 0x1346a1450 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h64 0x1346a16b0 | th_max = 576 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h80 0x1346a1910 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h96 0x1346a1b70 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h112 0x1346a1f80 | th_max = 832 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h128 0x1346a2610 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q5_1_h256 0x1346a2870 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h64 0x1346a2ad0 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h80 0x1346a3220 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h96 0x1346a3750 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h112 0x1346a3b60 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h128 0x1346a42e0 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_q8_0_h256 0x1346a4810 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_f16_h128 0x1346a4c20 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h128 (not supported)
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_0_h128 0x1346a53a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_1_h128 0x1346a58d0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_0_h128 0x1346a5ce0 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_1_h128 0x1346a6460 | th_max = 768 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q8_0_h128 0x1346a6990 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_f16_h256 0x1346a6da0 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h256 (not supported)
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_0_h256 0x1346a7520 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q4_1_h256 0x1346a7a50 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_0_h256 0x1346a7e60 | th_max = 704 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q5_1_h256 0x1346a85e0 | th_max = 704 | th_width = 32
ggml_metal_init: loaded kernel_flash_attn_ext_vec_q8_0_h256 0x1346a8b10 | th_max = 896 | th_width = 32
ggml_metal_init: loaded kernel_cpy_i32_i32 0x1346a9430 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_f32 0x1346a99e0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_f16 0x1346a9f90 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_cpy_f32_bf16 (not supported)
ggml_metal_init: loaded kernel_cpy_f16_f32 0x1346aa540 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f16_f16 0x1346aaaf0 | th_max = 1024 | th_width = 32
ggml_metal_init: skipping kernel_cpy_bf16_f32 (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_bf16 (not supported)
ggml_metal_init: loaded kernel_cpy_f32_q8_0 0x1346ab0a0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q4_0 0x1346ab650 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q4_1 0x1346abc00 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q5_0 0x1346ac1b0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_q5_1 0x1346ac760 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cpy_f32_iq4_nl 0x1346acd10 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_concat 0x1346ad370 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sqr 0x1346adb50 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sqrt 0x1346ae330 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sin 0x1346aeb10 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_reciprocal 0x1346af2f0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_cos 0x1346afad0 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_sum_rows 0x1346b0030 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_pool_2d_avg_f32 0x1346b0590 | th_max = 1024 | th_width = 32
ggml_metal_init: loaded kernel_pool_2d_max_f32 0x1346b0af0 | th_max = 1024 | th_width = 32
ggml_gallocr_reserve_n: reallocating Metal buffer from size 0.00 MiB to 5079.38 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB
ggml_gallocr_reserve_n: reallocating Metal buffer from size 0.00 MiB to 180.01 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 4.01 MiB
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
ggml_gallocr_needs_realloc: node cache_k_l0 (view) (permuted) (cont) is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_needs_realloc: node cache_k_l0 (view) (permuted) (cont) is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_needs_realloc: node cache_k_l0 (view) (permuted) (cont) is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
... hundreds more of this ...
ggml_gallocr_needs_realloc: node cache_k_l0 (view) (permuted) (cont) is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
Process 33871 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x50)
frame #0: 0x0000000100170784 tts-cli`ggml_metal_get_buffer(t=0x00000001386f5e10, offs=0x000000016fdfd4f0) at ggml-metal.m:937:111
934
935 ggml_backend_buffer_t buffer = t->view_src ? t->view_src->buffer : t->buffer;
936
-> 937 struct ggml_backend_metal_buffer_context * buf_ctx = (struct ggml_backend_metal_buffer_context *) buffer->context;
938
939 // find the view that contains the tensor fully
940 for (int i = 0; i < buf_ctx->n_buffers; ++i) {
Target 0: (tts-cli) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x50)
* frame #0: 0x0000000100170784 tts-cli`ggml_metal_get_buffer(t=0x00000001386f5e10, offs=0x000000016fdfd4f0) at ggml-metal.m:937:111
frame #1: 0x000000010015132c tts-cli`ggml_metal_encode_node(backend=0x0000600001b6c000, idx=125, encoder=0x0000600000736be0) at ggml-metal.m:1191:36
frame #2: 0x0000000100150ab8 tts-cli`__ggml_backend_metal_set_n_cb_block_invoke(.block_descriptor=0x0000600003bb6df0, iter=1) at ggml-metal.m:4131:25
frame #3: 0x0000000100170cc8 tts-cli`ggml_metal_graph_compute(backend=0x0000600001b6c000, gf=0x0000000134c63e68) at ggml-metal.m:3726:13
frame #4: 0x0000000100170958 tts-cli`ggml_backend_metal_graph_compute(backend=0x0000600001b6c000, cgraph=0x0000000134c63e68) at ggml-metal.m:4111:12
frame #5: 0x000000010018ded0 tts-cli`ggml_backend_graph_compute_async(backend=0x0000600001b6c000, cgraph=0x0000000134c63e68) at ggml-backend.cpp:332:12
frame #6: 0x0000000100191600 tts-cli`ggml_backend_sched_compute_splits(sched=0x0000000134c60a00) at ggml-backend.cpp:1393:35
frame #7: 0x00000001001912c8 tts-cli`ggml_backend_sched_graph_compute_async(sched=0x0000000134c60a00, graph=0x00000001386a8020) at ggml-backend.cpp:1584:12
frame #8: 0x0000000100077fcc tts-cli`dac_runner::run(this=0x0000600003b0aac0, input_tokens=0x000000013516a400, sequence_length=821, outputs=0x000000016fdfddc0) at dac_model.cpp:203:5
frame #9: 0x00000001000906dc tts-cli`dia_runner::generate_from_batch(this=0x0000600001606e90, batch=0x000000016fdfdbb8, output=0x000000016fdfddc0) at model.cpp:868:17
frame #10: 0x00000001000909a4 tts-cli`dia_runner::generate(this=0x0000600001606e90, sentence="[S1] Hi, I am Dia. All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.", output=0x000000016fdfddc0, config=0x000000016fdfdf30) at model.cpp:891:5
frame #11: 0x0000000100004dc0 tts-cli`main(argc=12, argv=0x000000016fdff268) at cli.cpp:86:13
frame #12: 0x000000019c0eab98 dyld`start + 6076
(lldb)
I'm working on this... but I wondered if it is a known limitation.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels