Skip to content

Releases: Disty0/sdnq

v0.1.7

14 Apr 14:32

Choose a tag to compare

  • Add support for nn.Embedding quantization.
  • Add modules_quant_config to be able to use a different quant config on some layers.
  • Make dynamic_loss_threshold auto selected by default.
  • Add use_gram_ns option to Muon.
  • Add Tensor Descriptor kernels to Triton MM for Intel Arc GPUs.
  • Add more optimized Triton MM configs.
  • Add Gemma4 keys.

Full Changelog: v0.1.6...v0.1.7

v0.1.6

13 Mar 12:12

Choose a tag to compare

  • Add 9, 10, 11, 12, 13, 14, 15 and 16 bit integer and floating point format support.
    • SDNQ now supports every format all the way from 1 bit to 16 bit, totaling to 176 supported formats.
  • Allow all supported types to be allowed in dynamic quantization.
  • Add support for basic CPU offloading on optimizer states.
  • Update the default optimizer betas to match PyTorch defaults.
  • Set dequantize_fp32 to True by default and cast svd to torch_dtype.
  • Handle FP64 inputs and don't downcast them to FP32 when quantizing.
  • Fix FP16 MM is selected instead of FP8 MM on FP formats other than FP8.

v0.1.5

24 Feb 12:56

Choose a tag to compare

  • Rename norm_mode none to clip on optimizers.
  • Fix torch.compile on optimizers.
  • Disable contiguous_mm for fp8 mm.
  • Cast scales, zero_point and svd to torch_dtype before quantization.
  • Add clamping to std normalization on dynamic quantization.
  • Add CosmosTransformer3DModel keys.

Full Changelog: v0.1.4...v0.1.5

v0.1.4

19 Jan 14:41

Choose a tag to compare

  • Add Kahan Summation with use_kahan to optimizers.
  • Add basic CPU offload with offload_buffers to optimizers.
  • Add torch.compile support with use_torch_compile to optimizers.
  • Add minimum ndim and numel checks to optimizers.
  • Add more guards against NaNs on optimizers.
  • Add more broadcast, view, split and select ops to SDNQTensor for DeepSpeed compatibility.
  • Add SDNQLayer, SDNQLinear and SDNQConv layer wrappers.
  • Add GLM Image and LTX 2 keys to the model specific keys.
  • Disable stochastic rounding on backward pass with quantized matmul.
  • Set the default max_shard_size to 5GB.
  • Set the minimum supported Python version to 3.10.
  • Set skip compile on dynamic quantization.
  • Register sdnq_training as a backend to diffusers and transformers.
  • Refactor optimizers again to make them more easily expandable.
  • Fix compatibility with outdated PyTorch versions.
  • Fix quantized matmul with packed weights.
  • Fix stochastic rounding with float type weights.

Full Changelog: v0.1.3...v0.1.4

v0.1.3

27 Dec 16:58

Choose a tag to compare

  • Add new stack of custom floating point types totaling to 69 different floating point formats.
    New notable ones are: float7_e3m3fn, float6_e3m2fn, float5_e2m2fn, float4_e2m1fn, float3_e1m1fn, float2_e1m0fn
  • Add use_dynamic_quantization option.
    Enabling this option will dynamically select a per layer quantization type based on the dynamic_loss_threshold.
    weights_dtype will be used as the minimum allowed quantization type when this option is enabled.
  • Rename sdnq.training.sdnq_post_load_quant to sdnq.training.sdnq_training_post_load_quant.
  • Check for incompatible weight shapes for matmul in apply_sdnq_options_to_module.
  • Remove forced uint4 minimum from conv layers.
  • Add integer stochastic rounding to copy_stochastic_.
  • Refactor optimizer codes in more generalized way.
  • Reduce optimizer memory usage.
  • Fix wrong fp8 type is set as the matmul type.
  • Fix CAME optimizer.

Full Changelog: v0.1.2...v0.1.3

v0.1.2

09 Dec 18:16

Choose a tag to compare

  • Add Transformers V5 support.
  • Add basic sanity check for Triton.
  • Add nan check and grad clip to optimizers.
  • Fix FP8 CKPT MM on training.
  • Fix FP16 MM with FP8 weights.
  • Fix end to end torch.compile on training.
  • Fix use_stochastic_rounding getting ignored.
  • Fix quantization_device getting ignored on post load quant.
  • Improve integer stochastic rounding performance on backward pass.

Full Changelog: v0.1.1...v0.1.2

v0.1.1

29 Nov 16:30

Choose a tag to compare

Add pypi workflow and minor fixes.

Full Changelog: v0.1.0...v0.1.1

v0.1.0

28 Nov 01:00

Choose a tag to compare

v0.1.0 release