Skip to content

Diffusers 0.37.0: Modular Diffusers, New image and video pipelines, multiple core library improvements, and more 🔥

Latest

Choose a tag to compare

@sayakpaul sayakpaul released this 05 Mar 15:05
· 18 commits to main since this release

Modular Diffusers

Modular Diffusers introduces a new way to build diffusion pipelines by composing reusable blocks. Instead of writing entire pipelines from scratch, you can now mix and match building blocks to create custom workflows tailored to your specific needs! This complements the existing DiffusionPipeline class, providing a more flexible way to create custom diffusion pipelines.

Find more details on how to get started with Modular Diffusers here, and also check out the announcement post.

New Pipelines and Models

Image 🌆

  • Z Image Omni Base: Z-Image is the foundation model of the Z-Image family, engineered for good quality, robust generative diversity, broad stylistic coverage, and precise prompt adherence. While Z-Image-Turbo is built for speed, Z-Image is a full-capacity, undistilled transformer designed to be the backbone for creators, researchers, and developers who require the highest level of creative freedom. Thanks to @RuoyiDufor for contributing this in #12857.
  • Flux2 Klein:FLUX.2 [Klein] unifies generation and editing in a single compact architecture, delivering state-of-the-art quality with end-to-end inference in as low as under a second. Built for applications that require real-time image generation without sacrificing quality, and runs on consumer hardware, with as little as 13GB VRAM.
  • Qwen Image Layered: Qwen-Image-Layered is a model capable of decomposing an image into multiple RGBA layers. This layered representation unlocks inherent editability: each layer can be independently manipulated without affecting other content. Thanks to @naykun for contributing this in #12853.
  • FIBO Edit: Fibo Edit is an 8B parameter image-to-image model that introduces a new paradigm of structured control, operating on JSON inputs paired with source images to enable deterministic and repeatable editing workflows. Featuring native masking for granular precision, it moves beyond simple prompt-based diffusion to offer explicit, interpretable control optimized for production environments. Its lightweight architecture is designed for deep customization, empowering researchers to build specialized “Edit” models for domain-specific tasks while delivering top-tier aesthetic quality. Thanks galbria for contributing it in #12930.
  • Cosmos Predict2.5: Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world. Thanks to @miguelmartin75 for contributing it in #12852.
  • Cosmos Transfer2.5: Cosmos-Transfer2.5 is a conditional world generation model with adaptive multimodal control, that produces high-quality world simulations conditioned on multiple control inputs. These inputs can take different modalities—including edges, blurred video, segmentation maps, and depth maps. Thanks to @miguelmartin75 for contributing it in #13066.
  • GLM-Image: GLM-Image is an image generation model adopts a hybrid autoregressive + diffusion decoder architecture, effectively pushing the upper bound of visual fidelity and fine-grained details. In general image generation quality, it aligns with industry-standard LDM-based approaches, while demonstrating significant advantages in knowledge-intensive image generation scenarios. Thanks to @zRzRzRzRzRzRzR for contributing it in #12973.
  • RAE: Representation Autoencoders (aka RAE) are an exciting alternative to traditional VAEs, typically used in the area of latent-space diffusion models of image generation. RAEs leverage pre-trained vision encoders and train lightweight decoders for the task of reconstruction.

Video + audio 🎥 🎼

  • LTX-2: LTX-2 is an audio-conditioned text-to-video generation model that can generate videos with synced audio. Full and distilled model inference, as well as two-stage inference with spatial sampling, is supported. We also support a conditioning pipeline that allows for passing different conditions (such as images, series of images, etc.). Check out the docs to learn more!
  • Helios: Helios is a 14B video generation model that runs at 17 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching a strong baseline in quality. Thanks to @SHYuanBest for contributing this in #13208.

Improvements to Core Library

New caching methods

New context-parallelism (CP) backends

Misc

  • Mambo-G Guidance: New guider implementation (#12862)
  • Laplace Scheduler for DDPM (#11320)
  • Custom Sigmas in UniPCMultistepScheduler (#12109)
  • MultiControlNet support for SD3 Inpainting (#11251)
  • Context parallel in native flash attention (#12829)
  • NPU Ulysses Attention Support (#12919)
  • Fix Wan 2.1 I2V Context Parallel Inference (#12909)
  • Fix Qwen-Image Context Parallel Inference (#12970)
  • Introduction to @apply_lora_scale decorator for simplifying model definitions (#12994)
  • Introduction of pipeline-level “cpu” device_map (#12811)
  • Enable CP for kernels-based attention backends (#12812)
  • Diffusers is fully functional with Transformers V5 (#12976)

A lot of the above features/improvements came as part of the MVP program we have been running. Immense thanks to the contributors!

Bug Fixes

  • Fix QwenImageEditPlus on NPU (#13017)
  • Fix MT5Tokenizer → use T5Tokenizer for Transformers v5.0+ compatibility (#12877)
  • Fix Wan/WanI2V patchification (#13038)
  • Fix LTX-2 inference with num_videos_per_prompt > 1 and CFG (#13121)
  • Fix Flux2 img2img prediction (#12855)
  • Fix QwenImage txt_seq_lens handling (#12702)
  • Fix prefix_token_len bug (#12845)
  • Fix ftfy imports in Wan and SkyReels-V2 (#12314, #13113)
  • Fix is_fsdp determination (#12960)
  • Fix GLM-Image get_image_features API (#13052)
  • Fix Wan 2.2 when either transformer isn't present (#13055)
  • Fix guider issue (#13147)
  • Fix torchao quantizer for new versions (#12901)
  • Fix GGUF for unquantized types with unquantize kernels (#12498)
  • Make Qwen hidden states contiguous for torchao (#13081)
  • Make Flux hidden states contiguous (#13068)
  • Fix Kandinsky 5 hardcoded CUDA autocast (#12814)
  • Fix aiter availability check (#13059)
  • Fix attention mask check for unsupported backends (#12892)
  • Allow prompt and prior_token_ids simultaneously in GlmImagePipeline (#13092)
  • GLM-Image batch support (#13007)
  • Cosmos 2.5 Video2World frame extraction fix (#13018)
  • ResNet: only use contiguous in training mode (#12977)

All commits

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @delmalih
    • Improve docstrings and type hints in scheduling_dpmsolver_singlestep.py (#12798)
    • Improve docstrings and type hints in scheduling_edm_euler.py (#12871)
    • Improve docstrings and type hints in scheduling_consistency_decoder.py (#12928)
    • Improve docstrings and type hints in scheduling_consistency_models.py (#12931)
    • Improve docstrings and type hints in scheduling_cosine_dpmsolver_multistep.py (#12936)
    • [Docs] Replace root CONTRIBUTING.md with symlink to source docs (#12986)
    • Improve docstrings and type hints in scheduling_ddim_cogvideox.py (#12992)
    • Improve docstrings and type hints in scheduling_ddim_flax.py (#13010)
    • Improve docstrings and type hints in scheduling_ddim_inverse.py (#13020)
    • Improve docstrings and type hints in scheduling_ddim_parallel.py (#13023)
    • Improve docstrings and type hints in scheduling_ddpm_flax.py (#13024)
    • Improve docstrings and type hints in scheduling_ddpm_parallel.py (#13027)
    • Flag Flax schedulers as deprecated (#13031)
    • docs: improve docstring scheduling_dpm_cogvideox.py (#13044)
    • docs: improve docstring scheduling_dpmsolver_multistep_inverse.py (#13083)
    • docs: improve docstring scheduling_dpmsolver_multistep_inverse.py (#13085)
    • docs: improve docstring scheduling_edm_dpmsolver_multistep.py (#13122)
    • docs: improve docstring scheduling_flow_match_euler_discrete.py (#13127)
    • docs: improve docstring scheduling_flow_match_heun_discrete.py (#13130)
    • docs: improve docstring scheduling_flow_match_lcm.py (#13160)
    • docs: improve docstring scheduling_ipndm.py (#13198)
  • @yiyixuxu
    • [Modular]z-image (#12808)
    • more update in modular (#12560)
    • [Modular] qwen refactor (#12872)
    • [Modular] better docstring (#12932)
    • [Modular] mellon utils (#12978)
    • Flux2 klein (#12982)
    • [modular] fix a bug in mellon param & improve docstrings (#12980)
    • [modular] add auto_docstring & more doc related refactors (#12958)
    • [modular]support klein (#13002)
    • [Modular]add a real quick start guide (#13029)
    • [Modular] loader related (#13025)
    • [Modular] mellon doc etc (#13051)
    • [modular]simplify components manager doc (#13088)
    • [Modular] refactor Wan: modular pipelines by task etc (#13063)
    • [Modular] guard ModularPipeline.blocks attribute (#13014)
    • [Modular] add different pipeine blocks to init (#13145)
    • fix MT5Tokenizer (#13146)
    • fix guider (#13147)
    • [Modular] update doc for ModularPipeline (#13100)
    • [Modular] add explicit workflow support (#13028)
    • [Modular] update the auto pipeline blocks doc (#13148)
    • [modular] fallback to default_blocks_name when loading base block classes in ModularPipeline (#13193)
    • [modular]Update model card to include workflow (#13195)
    • [modular] not pass trust_remote_code to external repos (#13204)
  • @sayakpaul
    • Fix Qwen Edit Plus modular for multi-image input (#12601)
    • [docs] improve distributed inference cp docs. (#12810)
    • post release 0.36.0 (#12804)
    • Update distributed_inference.md to correct syntax (#12827)
    • [lora] Remove lora docs unneeded and add " # Copied from ..." (#12824)
    • fix the use of device_map in CP docs (#12902)
    • [core] remove unneeded autoencoder methods when subclassing from AutoencoderMixin (#12873)
    • [docs] fix torchao typo. (#12883)
    • Update wan.md to remove unneeded hfoptions (#12890)
    • [modular] Tests for custom blocks in modular diffusers (#12557)
    • [chore] remove controlnet implementations outside controlnet module. (#12152)
    • [core] Handle progress bar and logging in distributed environments (#12806)
    • [modular] error early in enable_auto_cpu_offload (#12578)
    • fix how is_fsdp is determined (#12960)
    • [LoRA] add LoRA support to LTX-2 (#12933)
    • [docs] polish caching docs. (#12684)
    • Z rz rz rz rz rz rz r cogview (#12973)
    • Update distributed_inference.md to reposition sections (#12971)
    • [chore] make transformers version check stricter for glm image. (#12974)
    • add klein docs. (#12984)
    • [core] gracefully error out when attn-backend x cp combo isn't supported. (#12832)
    • make style && make quality
    • Revert "make style && make quality"
    • [chore] make style to push new changes. (#12998)
    • fix Dockerfiles for cuda and xformers. (#13022)
    • [QwenImage] fix prompt isolation tests (#13042)
    • change to CUDA 12.9. (#13045)
    • [wan] fix layerwise upcasting tests on CPU (#13039)
    • [ci] uniform run times and wheels for pytorch cuda. (#13047)
    • [wan] fix wan 2.2 when either of the transformers isn't present. (#13055)
    • [modular] change the template modular pipeline card (#13072)
    • [docs] Fix syntax error in quantization configuration (#13076)
    • [core] make flux hidden states contiguous (#13068)
    • [core] make qwen hidden states contiguous to make torchao happy. (#13081)
    • [modular] add modular tests for Z-Image and Wan (#13078)
    • [lora] fix non-diffusers lora key handling for flux2 (#13119)
    • [modular] add tests for robust model loading. (#13120)
    • fix cosmos transformer typing. (#13134)
    • Sunset Python 3.8 & get rid of explicit typing exports where possible (#12524)
    • feat: implement apply_lora_scale to remove boilerplate. (#12994)
    • [docs] fix ltx2 i2v docstring. (#13135)
    • [tests] accept recompile_limit from the user in tests (#13150)
    • [core] support device type device_maps to work with offloading. (#12811)
    • [core] Enable CP for kernels-based attention backends (#12812)
    • remove deps related to test from ci (#13164)
    • migrate to transformers v5 (#12976)
    • [docs] Fix torchrun command argument order in docs (#13181)
    • [attention backends] use dedicated wrappers from fa3 for cp. (#13165)
    • [tests] consistency tests for modular index (#13192)
    • [chore] updates in the pypi publication workflow. (#12805)
    • [tests] enable cpu offload test in torchao without compilation. (#12704)
    • remove db utils from benchmarking (#13199)
    • [Modular] implement requirements validation for custom blocks (#12196)
    • [lora] fix zimage lora conversion to support for more lora. (#13209)
    • [attention backends] change to updated repo and version. (#13161)
    • Release: v0.37.0-release
  • @DN6
    • [WIP] Add Flux2 modular (#12763)
    • Refactor Model Tests (#12822)
    • [Docs] Add guide for AutoModel with custom code (#13099)
    • [CI] Refactor Wan Model Tests (#13082)
    • [Pipelines] Remove k-diffusion (#13152)
    • [CI] Add ftfy as a test dependency (#13155)
    • [CI] Fix new LoRAHotswap tests (#13163)
    • Allow Automodel to use from_config with custom code. (#13123)
    • [AutoModel] Fix bug with subfolders and local model paths when loading custom code (#13197)
    • [AutoModel] Allow registering auto_map to model config (#13186)
    • [Modular] Save Modular Pipeline weights to Hub (#13168)
    • Clean up accidental files (#13202)
  • @naykun
    • [qwen-image] edit 2511 support (#12839)
    • Qwen Image Layered Support (#12853)
  • @junqiangwu
    • Add support for LongCat-Image (#12828)
    • fix the prefix_token_len bug (#12845)
  • @hlky
    • Z-Image-Turbo ControlNet (#12792)
    • Z-Image-Turbo from_single_file fix (#12888)
    • Detect 2.0 vs 2.1 ZImageControlNetModel (#12861)
    • disable_mmap in pipeline from_pretrained (#12854)
    • ZImageControlNet cfg (#13080)
  • @miguelmartin75
    • Cosmos Predict2.5 Base: inference pipeline, scheduler & chkpt conversion (#12852)
    • Cosmos Predict2.5 14b Conversion (#12863)
    • Fix typo in src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py (#12914)
    • Cosmos Transfer2.5 inference pipeline: general/{seg, depth, blur, edge} (#13066)
    • Cosmos Transfer2.5 Auto-Regressive Inference Pipeline (#13114)
  • @RuoyiDu
    • Add z-image-omni-base implementation (#12857)
  • @r4inm4ker
    • Community Pipeline: Add z-image differential img2img (#12882)
  • @yaoqih
    • LTX Video 0.9.8 long multi prompt (#12614)
  • @dg845
    • Add LTX 2.0 Video Pipelines (#12915)
    • Add Flag to PeftLoraLoaderMixinTests to Enable/Disable Text Encoder LoRA Tests (#12962)
    • LTX 2 Single File Support (#12983)
    • LTX 2 Improve encode_video by Accepting More Input Types (#13057)
    • Fix LTX-2 Inference when num_videos_per_prompt > 1 and CFG is Enabled (#13121)
    • [CI] Fix setuptools pkg_resources Errors (#13129)
    • [CI] Fix setuptools pkg_resources Bug for PR GPU Tests (#13132)
    • [CI] Revert setuptools CI Fix as the Failing Pipelines are Deprecated (#13149)
    • Fix ftfy import for PRX Pipeline (#13154)
    • Fix AutoModel typing Import Error (#13178)
    • Add Helios-14B Video Generation Pipelines (#13208)
    • Add LTX2 Condition Pipeline (#13058)
  • @kashif
    • [Research] Latent Perceptual Loss (LPL) for Stable Diffusion XL (#11573)
    • Fix QwenImage txt_seq_lens handling (#12702)
    • [Qwen] avoid creating attention masks when there is no padding (#12987)
  • @bhavya01
    • Change timestep device to cpu for xla (#11501)
  • @linoytsaban
    • [LoRA] add lora_alpha to sana README (#11780)
    • Z image lora training (#13056)
  • @stevhliu
    • [docs] Remote inference (#12372)
    • [docs] add docs for qwenimagelayered (#13158)
  • @hameerabbasi
    • Add ChromaInpaintPipeline (#12848)
    • Remove *pooled_* mentions from Chroma inpaint (#13026)
  • @galbria
  • @JaredforReal
    • [GLM-Image] Add batch support for GlmImagePipeline (#13007)
    • [bug fix] GLM-Image fit new get_image_features API (#13052)
    • [Fix]Allow prompt and prior_token_ids to be provided simultaneously in GlmImagePipeline (#13092)
  • @rootonchair
    • LTX2 distilled checkpoint support (#12934)
  • @AlanPonnachan
    • Add support for Magcache (#12744)
  • @CalamitousFelicitousness
    • Feature/zimage inpaint pipeline (#13006)
  • @Ando233
    • feat: implement rae autoencoder. (#13046)