KLAWQ: KL-Aware Weight Quantization for Edge AI

A novel post-training quantization framework that enhances GPTQ by integrating KL divergence for better accuracy preservation when deploying Large Language Models on edge devices. 1

Core Innovation

KLAWQ extends GPTQ by adding a KL divergence term to align quantized model outputs with the original model's distribution: 2

L(Q) = L_MSE(Q) + β * L_KL(Q)

The algorithm modifies the Hessian computation as H_tot = H + βA, where A is the KL Hessian matrix. 3

Key Components

Configuration: Hyperparameters (β, τ) in KLAWQ/gptqmodel/quantization/config.py 4
Core Algorithm: KL Hessian computation in KLAWQ/kl-aware-quant/quantization/gptq.py 5
Quantization Engine: Low-level operations in KLAWQ/kl-aware-quant/quantization/quantizer.py 6
Analysis Notebooks: Experimental validation in kl-hessian-gptq-*.ipynb files 7

Quick Start

Clone and Setup: 8

git clone https://github.com/ha405/Compression-Framework-for-EdgeAI
cd Compression-Framework-for-EdgeAI

Install Dependencies: Install PyTorch, transformers, and other requirements from requirements.txt
Run Quantization: Use the Jupyter notebooks for experimentation or integrate the KLAWQ modules directly

Results

Experiments on GPT-2 at 8-bit precision demonstrate improved perplexity scores compared to vanilla GPTQ while maintaining post-training quantization efficiency. 9

Notes

The framework is built on a comprehensive infrastructure stack including PyTorch >=2.4.1, transformers >=4.51.2, and FastAPI for model serving. The project structure shows a modular design with separate components for adapter functionality, model definitions, and processing loops, though the core KLAWQ innovation is concentrated in the quantization modules. 10

Wiki pages you might want to explore:

Overview (ha405/Compression-Framework-for-EdgeAI)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KLAWQ: KL-Aware Weight Quantization for Edge AI

Core Innovation

Key Components

Quick Start

Results

Notes

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

KLAWQ: KL-Aware Weight Quantization for Edge AI

Core Innovation

Key Components

Quick Start

Results

Notes