Skip to content

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#18357

Summary

Add a minimal extension point for custom memory (KV cache) implementations.

Motivation

  • KV cache optimization is an active research area (compression, semantic caching, etc.)
  • Currently requires forking llama.cpp to experiment with custom implementations
  • GGML backends already use similar factory patterns

Changes

  • Add llama_memory_factory_fn typedef to llama.h
  • Add llama_set_memory_factory() to set custom factory
  • Check factory before default memory creation in llama_context constructor

Usage

  1. Implement factory function returning llama_memory_t
  2. Call llama_set_memory_factory() before llama_init_from_model()
  3. Factory can return nullptr to use default implementation

Example

static llama_memory_t my_cache_factory(
    const struct llama_model * model,
    const struct llama_context_params * params,
    void * user_data
) {
    if (should_use_custom_cache(params)) {
        return create_my_custom_cache(model, params);
    }
    return nullptr;  // Fall back to default
}

// Register before context creation
llama_set_memory_factory(my_cache_factory, nullptr);

Impact

  • Zero overhead when not used (single null pointer check)
  • No breaking changes to existing API
  • 35 lines total across 2 files

Add a minimal extension point for custom memory (KV cache) implementations.

Motivation:
- KV cache optimization is an active research area (compression, semantic caching)
- Currently requires forking llama.cpp to experiment with custom implementations
- GGML backends already use similar factory patterns

Changes:
- Add llama_memory_factory_fn typedef to llama.h
- Add llama_set_memory_factory() to set custom factory
- Check factory before default memory creation in llama_context constructor

Usage:
1. Implement factory function returning llama_memory_t
2. Call llama_set_memory_factory() before llama_init_from_model()
3. Factory can return nullptr to use default implementation

Zero overhead when not used (single null pointer check).
@loci-dev loci-dev force-pushed the main branch 28 times, most recently from 4df802d to 574c51e Compare December 29, 2025 14:09
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 2f72634 to f2a5c7f Compare January 31, 2026 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants