TurboQuant-MoE

TurboQuant-MoE is a KV-cache compression and dynamic MoE expert management engine for LLM inference.

Why it exists

Large-context and MoE inference is usually constrained by VRAM and memory bandwidth. TurboQuant targets this bottleneck with:

1/2/3-bit Polar quantization for KV tensors
QJL residual correction for fidelity preservation
Cross-layer KV sharing and delta-based compression
MoE expert cache and prefetch primitives

Installation

The project is currently distributed from source.

git clone https://github.com/RemizovDenis/turboquant.git
cd turboquant
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e ".[dev,transformers,benchmark]"

Quick start

from turboquant.core.turboquant import TurboQuantKVCache, TurboQuantConfig

config = TurboQuantConfig(
    head_dim=128,
    num_heads=32,
    bits=3,
    residual_correction=True,
)
cache = TurboQuantKVCache(config)

compressed = cache.compress(keys=key_tensor, values=val_tensor)
recon_k, recon_v = cache.decompress(compressed)

Validate locally

ruff check turboquant tests
ruff format --check turboquant tests
mypy turboquant --strict
pytest tests/ -v --tb=short -x -k "not gpu and not cuda and not triton"

Documentation

Integrations

HuggingFace Transformers (TurboQuantCache)
Vector databases (Qdrant, ChromaDB, NumPy adapter)
Ollama/vLLM integration helpers

Project standards

License

Business Source License 1.1 (BUSL-1.1). Commercial use requires a commercial license. Converts to Apache-2.0 on 2030-04-01.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.github		.github
benchmark_results		benchmark_results
benchmarks		benchmarks
docs		docs
monitoring		monitoring
results		results
scripts		scripts
tests		tests
turboquant		turboquant
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.cuda		Dockerfile.cuda
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
TRUST_REPORT.md		TRUST_REPORT.md
benchmark_ultimate_m4.json		benchmark_ultimate_m4.json
docker-compose.yml		docker-compose.yml
prometheus.yml		prometheus.yml
pyproject.toml		pyproject.toml
verify_14x.py		verify_14x.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TurboQuant-MoE

Why it exists

Installation

Quick start

Validate locally

Documentation

Integrations

Project standards

License

About

Uh oh!

Releases 2

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

TurboQuant-MoE

Why it exists

Installation

Quick start

Validate locally

Documentation

Integrations

Project standards

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages