Skip to content

RemizovDenis/turboquant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

TurboQuant-MoE

CI Release Python 3.11+ License BUSL-1.1

TurboQuant-MoE is a KV-cache compression and dynamic MoE expert management engine for LLM inference.

Why it exists

Large-context and MoE inference is usually constrained by VRAM and memory bandwidth. TurboQuant targets this bottleneck with:

  • 1/2/3-bit Polar quantization for KV tensors
  • QJL residual correction for fidelity preservation
  • Cross-layer KV sharing and delta-based compression
  • MoE expert cache and prefetch primitives

Installation

The project is currently distributed from source.

git clone https://github.com/RemizovDenis/turboquant.git
cd turboquant
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e ".[dev,transformers,benchmark]"

Quick start

from turboquant.core.turboquant import TurboQuantKVCache, TurboQuantConfig

config = TurboQuantConfig(
    head_dim=128,
    num_heads=32,
    bits=3,
    residual_correction=True,
)
cache = TurboQuantKVCache(config)

compressed = cache.compress(keys=key_tensor, values=val_tensor)
recon_k, recon_v = cache.decompress(compressed)

Validate locally

ruff check turboquant tests
ruff format --check turboquant tests
mypy turboquant --strict
pytest tests/ -v --tb=short -x -k "not gpu and not cuda and not triton"

Documentation

Integrations

  • HuggingFace Transformers (TurboQuantCache)
  • Vector databases (Qdrant, ChromaDB, NumPy adapter)
  • Ollama/vLLM integration helpers

Project standards

License

Business Source License 1.1 (BUSL-1.1). Commercial use requires a commercial license. Converts to Apache-2.0 on 2030-04-01.

About

TurboQuant: KV-cache compression for faster and cheaper LLM inference.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Contributors