Commit 502532a
Ralf Waldukat
Update to llama.cpp 2026-01-01
- Update llama.cpp submodule (2025-08-14 → 2026-01-01)
- Remove deprecated KV cache functions (use llama_memory_* instead)
- Remove llama_sampler_init_softmax (deprecated)
- Add LLAMA_ROPE_TYPE_IMROPE constant
- Add llama_flash_attn_type enum (AUTO/DISABLED/ENABLED)
- Add llama_params_fit_status enum
- Add llama_model_meta_key enum for sampling metadata
- Add llama_model_params fields: no_host, no_alloc
- Replace llama_context_params.flash_attn bool with flash_attn_type enum
- Add 15 new API functions:
- llama_max_tensor_buft_overrides
- llama_n_ctx_seq
- llama_model_n_embd_inp
- llama_model_is_hybrid
- llama_flash_attn_type_name
- llama_model_meta_key_str
- llama_adapter_meta_* functions (5)
- llama_log_get/set
- llama_memory_breakdown_print
- Add ggml_log_callback typedef
- Disable LLAVA build (CMake incompatibility in upstream mtmd)
- Bump version 0.3.16 → 0.4.0
Breaking changes:
- flash_attn bool removed, use flash_attn_type enum
- KV cache functions removed, use llama_memory_* API
Tested with Nemotron-3-Nano-30B hybrid model.1 parent c37132b commit 502532a
File tree
4 files changed
+159
-260
lines changed- llama_cpp
- vendor
4 files changed
+159
-260
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
0 commit comments