Releases: Disty0/sdnq
Releases · Disty0/sdnq
v0.1.7
- Add support for
nn.Embeddingquantization. - Add
modules_quant_configto be able to use a different quant config on some layers. - Make
dynamic_loss_thresholdauto selected by default. - Add
use_gram_nsoption to Muon. - Add Tensor Descriptor kernels to Triton MM for Intel Arc GPUs.
- Add more optimized Triton MM configs.
- Add Gemma4 keys.
Full Changelog: v0.1.6...v0.1.7
v0.1.6
- Add 9, 10, 11, 12, 13, 14, 15 and 16 bit integer and floating point format support.
- SDNQ now supports every format all the way from 1 bit to 16 bit, totaling to 176 supported formats.
- Allow all supported types to be allowed in dynamic quantization.
- Add support for basic CPU offloading on optimizer states.
- Update the default optimizer betas to match PyTorch defaults.
- Set dequantize_fp32 to True by default and cast svd to torch_dtype.
- Handle FP64 inputs and don't downcast them to FP32 when quantizing.
- Fix FP16 MM is selected instead of FP8 MM on FP formats other than FP8.
v0.1.5
- Rename norm_mode none to clip on optimizers.
- Fix torch.compile on optimizers.
- Disable contiguous_mm for fp8 mm.
- Cast scales, zero_point and svd to torch_dtype before quantization.
- Add clamping to std normalization on dynamic quantization.
- Add CosmosTransformer3DModel keys.
Full Changelog: v0.1.4...v0.1.5
v0.1.4
- Add Kahan Summation with
use_kahanto optimizers. - Add basic CPU offload with
offload_buffersto optimizers. - Add torch.compile support with
use_torch_compileto optimizers. - Add minimum ndim and numel checks to optimizers.
- Add more guards against NaNs on optimizers.
- Add more broadcast, view, split and select ops to SDNQTensor for DeepSpeed compatibility.
- Add SDNQLayer, SDNQLinear and SDNQConv layer wrappers.
- Add GLM Image and LTX 2 keys to the model specific keys.
- Disable stochastic rounding on backward pass with quantized matmul.
- Set the default max_shard_size to 5GB.
- Set the minimum supported Python version to 3.10.
- Set skip compile on dynamic quantization.
- Register sdnq_training as a backend to diffusers and transformers.
- Refactor optimizers again to make them more easily expandable.
- Fix compatibility with outdated PyTorch versions.
- Fix quantized matmul with packed weights.
- Fix stochastic rounding with float type weights.
Full Changelog: v0.1.3...v0.1.4
v0.1.3
- Add new stack of custom floating point types totaling to 69 different floating point formats.
New notable ones are:float7_e3m3fn,float6_e3m2fn,float5_e2m2fn,float4_e2m1fn,float3_e1m1fn,float2_e1m0fn - Add
use_dynamic_quantizationoption.
Enabling this option will dynamically select a per layer quantization type based on thedynamic_loss_threshold.
weights_dtypewill be used as the minimum allowed quantization type when this option is enabled. - Rename
sdnq.training.sdnq_post_load_quanttosdnq.training.sdnq_training_post_load_quant. - Check for incompatible weight shapes for matmul in
apply_sdnq_options_to_module. - Remove forced
uint4minimum from conv layers. - Add integer stochastic rounding to
copy_stochastic_. - Refactor optimizer codes in more generalized way.
- Reduce optimizer memory usage.
- Fix wrong fp8 type is set as the matmul type.
- Fix CAME optimizer.
Full Changelog: v0.1.2...v0.1.3
v0.1.2
- Add Transformers V5 support.
- Add basic sanity check for Triton.
- Add nan check and grad clip to optimizers.
- Fix FP8 CKPT MM on training.
- Fix FP16 MM with FP8 weights.
- Fix end to end torch.compile on training.
- Fix use_stochastic_rounding getting ignored.
- Fix quantization_device getting ignored on post load quant.
- Improve integer stochastic rounding performance on backward pass.
Full Changelog: v0.1.1...v0.1.2
v0.1.1
Add pypi workflow and minor fixes.
Full Changelog: v0.1.0...v0.1.1