Releases · Disty0/sdnq · GitHub

14 Apr 14:32

Disty0

v0.1.7 Latest

Latest

Add support for nn.Embedding quantization.
Add modules_quant_config to be able to use a different quant config on some layers.
Make dynamic_loss_threshold auto selected by default.
Add use_gram_ns option to Muon.
Add Tensor Descriptor kernels to Triton MM for Intel Arc GPUs.
Add more optimized Triton MM configs.
Add Gemma4 keys.

Full Changelog: v0.1.6...v0.1.7

Assets 2

13 Mar 12:12

Disty0

v0.1.6

Add 9, 10, 11, 12, 13, 14, 15 and 16 bit integer and floating point format support.
- SDNQ now supports every format all the way from 1 bit to 16 bit, totaling to 176 supported formats.
Allow all supported types to be allowed in dynamic quantization.
Add support for basic CPU offloading on optimizer states.
Update the default optimizer betas to match PyTorch defaults.
Set dequantize_fp32 to True by default and cast svd to torch_dtype.
Handle FP64 inputs and don't downcast them to FP32 when quantizing.
Fix FP16 MM is selected instead of FP8 MM on FP formats other than FP8.

Assets 2

24 Feb 12:56

Disty0

v0.1.5

Rename norm_mode none to clip on optimizers.
Fix torch.compile on optimizers.
Disable contiguous_mm for fp8 mm.
Cast scales, zero_point and svd to torch_dtype before quantization.
Add clamping to std normalization on dynamic quantization.
Add CosmosTransformer3DModel keys.

Full Changelog: v0.1.4...v0.1.5

Assets 2

19 Jan 14:41

Disty0

v0.1.4

Add Kahan Summation with use_kahan to optimizers.
Add basic CPU offload with offload_buffers to optimizers.
Add torch.compile support with use_torch_compile to optimizers.
Add minimum ndim and numel checks to optimizers.
Add more guards against NaNs on optimizers.
Add more broadcast, view, split and select ops to SDNQTensor for DeepSpeed compatibility.
Add SDNQLayer, SDNQLinear and SDNQConv layer wrappers.
Add GLM Image and LTX 2 keys to the model specific keys.
Disable stochastic rounding on backward pass with quantized matmul.
Set the default max_shard_size to 5GB.
Set the minimum supported Python version to 3.10.
Set skip compile on dynamic quantization.
Register sdnq_training as a backend to diffusers and transformers.
Refactor optimizers again to make them more easily expandable.
Fix compatibility with outdated PyTorch versions.
Fix quantized matmul with packed weights.
Fix stochastic rounding with float type weights.

Full Changelog: v0.1.3...v0.1.4

Assets 2

27 Dec 16:58

Disty0

v0.1.3

Add new stack of custom floating point types totaling to 69 different floating point formats.
New notable ones are: float7_e3m3fn, float6_e3m2fn, float5_e2m2fn, float4_e2m1fn, float3_e1m1fn, float2_e1m0fn
Add use_dynamic_quantization option.
Enabling this option will dynamically select a per layer quantization type based on the dynamic_loss_threshold.
weights_dtype will be used as the minimum allowed quantization type when this option is enabled.
Rename sdnq.training.sdnq_post_load_quant to sdnq.training.sdnq_training_post_load_quant.
Check for incompatible weight shapes for matmul in apply_sdnq_options_to_module.
Remove forced uint4 minimum from conv layers.
Add integer stochastic rounding to copy_stochastic_.
Refactor optimizer codes in more generalized way.
Reduce optimizer memory usage.
Fix wrong fp8 type is set as the matmul type.
Fix CAME optimizer.

Full Changelog: v0.1.2...v0.1.3

Assets 2

09 Dec 18:16

Disty0

v0.1.2

Add Transformers V5 support.
Add basic sanity check for Triton.
Add nan check and grad clip to optimizers.
Fix FP8 CKPT MM on training.
Fix FP16 MM with FP8 weights.
Fix end to end torch.compile on training.
Fix use_stochastic_rounding getting ignored.
Fix quantization_device getting ignored on post load quant.
Improve integer stochastic rounding performance on backward pass.

Full Changelog: v0.1.1...v0.1.2

Assets 2

29 Nov 16:30

Disty0

v0.1.1

Add pypi workflow and minor fixes.

Full Changelog: v0.1.0...v0.1.1

Assets 2

28 Nov 01:00

Disty0

v0.1.0

v0.1.0 release

Assets 2