Feature/group offload pinning #12747

Aki-07 · 2025-11-29T13:54:49Z

What does this PR do?

Fixes #11966

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul

sayakpaul · 2025-12-02T15:22:16Z

Thanks for your PR. However, it's being worked on in #12721.

sayakpaul · 2025-12-09T03:48:37Z

Could we resolve conflicts so that it's a bit easier to review? Seems like there's some overlap from #12692.

Aki-07 · 2025-12-10T06:19:12Z

Done! Rebased on latest main and resolved conflicts with #12692. Should be much cleaner to review now.

sayakpaul

Some initial comments.

sayakpaul · 2025-12-09T03:39:03Z

src/diffusers/hooks/group_offloading.py

+            should_synchronize = (
+                not self.group.onload_self and self.group.stream is not None and not should_onload_next_group
+            )


What if non_blocking=True?

Even with non_blocking=True, if a previous group onloaded this one on a side stream, we need a sync before the default stream uses the weights or we risk reading half-copied tensors. I’ve limited the sync to the record_stream=False case, when record_stream=True the tensors are tied to the consumer stream so we can safely skip the sync.

src/diffusers/hooks/group_offloading.py

bconstantine · 2025-12-10T19:13:15Z

Thank you for the initial comment! We are working on the solutions right now

sayakpaul · 2025-12-11T04:27:45Z

@bot /style

github-actions · 2025-12-11T04:28:05Z

Style bot fixed some files and pushed the changes.

HuggingFaceDocBuilderDev · 2025-12-11T04:28:36Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

bconstantine · 2025-12-13T07:58:34Z

These were the error logs

_____________________________________________________ AutoencoderKLTests.test_layerwise_casting_memory _____________________________________________________

self = <tests.models.autoencoders.test_models_autoencoder_kl.AutoencoderKLTests testMethod=test_layerwise_casting_memory>

    @require_torch_accelerator
    @torch.no_grad()
    def test_layerwise_casting_memory(self):
        MB_TOLERANCE = 0.2
        LEAST_COMPUTE_CAPABILITY = 8.0

        def reset_memory_stats():
            gc.collect()
            backend_synchronize(torch_device)
            backend_empty_cache(torch_device)
            backend_reset_peak_memory_stats(torch_device)

        def get_memory_usage(storage_dtype, compute_dtype):
            torch.manual_seed(0)
            config, inputs_dict = self.prepare_init_args_and_inputs_for_common()
            inputs_dict = cast_maybe_tensor_dtype(inputs_dict, torch.float32, compute_dtype)
            model = self.model_class(**config).eval()
            model = model.to(torch_device, dtype=compute_dtype)
            model.enable_layerwise_casting(storage_dtype=storage_dtype, compute_dtype=compute_dtype)

            reset_memory_stats()
            model(**inputs_dict)
            model_memory_footprint = model.get_memory_footprint()
            peak_inference_memory_allocated_mb = backend_max_memory_allocated(torch_device) / 1024**2

            return model_memory_footprint, peak_inference_memory_allocated_mb

        fp32_memory_footprint, fp32_max_memory = get_memory_usage(torch.float32, torch.float32)
        fp8_e4m3_fp32_memory_footprint, fp8_e4m3_fp32_max_memory = get_memory_usage(torch.float8_e4m3fn, torch.float32)
        fp8_e4m3_bf16_memory_footprint, fp8_e4m3_bf16_max_memory = get_memory_usage(
            torch.float8_e4m3fn, torch.bfloat16
        )

        compute_capability = get_torch_cuda_device_capability() if torch_device == "cuda" else None
        self.assertTrue(fp8_e4m3_bf16_memory_footprint < fp8_e4m3_fp32_memory_footprint < fp32_memory_footprint)
        # NOTE: the following assertion would fail on our CI (running Tesla T4) due to bf16 using more memory than fp32.
        # On other devices, such as DGX (Ampere) and Audace (Ada), the test passes. So, we conditionally check it.
        if compute_capability and compute_capability >= LEAST_COMPUTE_CAPABILITY:
>           self.assertTrue(fp8_e4m3_bf16_max_memory < fp8_e4m3_fp32_max_memory)
E           AssertionError: False is not true

tests\models\test_modeling_common.py:1757: AssertionError
_____________________________________________ AutoencoderKLTests.test_lora_adapter_wrong_metadata_raises_error _____________________________________________ 

self = <tests.models.autoencoders.test_models_autoencoder_kl.AutoencoderKLTests testMethod=test_lora_adapter_wrong_metadata_raises_error>

    @torch.no_grad()
    @unittest.skipIf(not is_peft_available(), "Only with PEFT")
    def test_lora_adapter_wrong_metadata_raises_error(self):
        from peft import LoraConfig

        from diffusers.loaders.lora_base import LORA_ADAPTER_METADATA_KEY
        from diffusers.loaders.peft import PeftAdapterMixin

        init_dict, _ = self.prepare_init_args_and_inputs_for_common()
        model = self.model_class(**init_dict).to(torch_device)

        if not issubclass(model.__class__, PeftAdapterMixin):
            pytest.skip(f"PEFT is not supported for this model ({model.__class__.__name__}).")

        denoiser_lora_config = LoraConfig(
            r=4,
            lora_alpha=4,
            target_modules=["to_q", "to_k", "to_v", "to_out.0"],
            init_lora_weights=False,
            use_dora=False,
        )
        model.add_adapter(denoiser_lora_config)
        self.assertTrue(check_if_lora_correctly_set(model), "LoRA layers not set correctly")

        with tempfile.TemporaryDirectory() as tmpdir:
            model.save_lora_adapter(tmpdir)
            model_file = os.path.join(tmpdir, "pytorch_lora_weights.safetensors")
            self.assertTrue(os.path.isfile(model_file))

            # Perturb the metadata in the state dict.
            loaded_state_dict = safetensors.torch.load_file(model_file)
            metadata = {"format": "pt"}
            lora_adapter_metadata = denoiser_lora_config.to_dict()
            lora_adapter_metadata.update({"foo": 1, "bar": 2})
            for key, value in lora_adapter_metadata.items():
                if isinstance(value, set):
                    lora_adapter_metadata[key] = list(value)
            metadata[LORA_ADAPTER_METADATA_KEY] = json.dumps(lora_adapter_metadata, indent=2, sort_keys=True)
>           safetensors.torch.save_file(loaded_state_dict, model_file, metadata=metadata)

tests\models\test_modeling_common.py:1315:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _  

tensors = {'decoder.mid_block.attentions.0.to_k.lora_A.weight': tensor([[-0.0836,  0.4591, -0.4989, -0.0175],
        [-0.2300, ...,  0.2900,  0.3015],
        [ 0.2199,  0.0162, -0.3994, -0.0383],
        [ 0.2069,  0.4327, -0.3422, -0.0724]]), ...}
filename = 'C:\\Users\\Bryan\\AppData\\Local\\Temp\\tmpbwk5t9sf\\pytorch_lora_weights.safetensors'
metadata = {'format': 'pt', 'lora_adapter_metadata': '{\n  "alora_invocation_tokens": null,\n  "alpha_pattern": {},\n  "arrow_con...e": null,\n  "trainable_token_indices": null,\n  "use_dora": false,\n  "use_qalora": false,\n  "use_rslora": false\n}'}

    def save_file(
        tensors: Dict[str, torch.Tensor],
        filename: Union[str, os.PathLike],
        metadata: Optional[Dict[str, str]] = None,
    ):
        """
        Saves a dictionary of tensors into raw bytes in safetensors format.

        Args:
            tensors (`Dict[str, torch.Tensor]`):
                The incoming tensors. Tensors need to be contiguous and dense.
            filename (`str`, or `os.PathLike`)):
                The filename we're saving into.
            metadata (`Dict[str, str]`, *optional*, defaults to `None`):
                Optional text only metadata you might want to save in your header.
                For instance it can be useful to specify more about the underlying
                tensors. This is purely informative and does not affect tensor loading.

        Returns:
            `None`

        Example:

        ```python
        from safetensors.torch import save_file
        import torch

        tensors = {"embedding": torch.zeros((512, 1024)), "attention": torch.zeros((256, 256))}
        save_file(tensors, "model.safetensors")
        ```
        """
>       serialize_file(_flatten(tensors), filename, metadata=metadata)
E       safetensors_rust.SafetensorError: Error while serializing: I/O error: The requested operation cannot be performed on a file with a user-mapped section open. (os error 1224)

C:\Users\Bryan\miniconda3\envs\diffusers_contrib\lib\site-packages\safetensors\torch.py:307: SafetensorError
________________________________________________________ AutoencoderKLTests.test_output_pretrained _________________________________________________________ 

self = <tests.models.autoencoders.test_models_autoencoder_kl.AutoencoderKLTests testMethod=test_output_pretrained>

    def test_output_pretrained(self):
        model = AutoencoderKL.from_pretrained("fusing/autoencoder-kl-dummy")
        model = model.to(torch_device)
        model.eval()

        # Keep generator on CPU for non-CUDA devices to compare outputs with CPU result tensors
        generator_device = "cpu" if not torch_device.startswith(torch_device) else torch_device
        if torch_device != "mps":
            generator = torch.Generator(device=generator_device).manual_seed(0)
        else:
            generator = torch.manual_seed(0)

        image = torch.randn(
            1,
            model.config.in_channels,
            model.config.sample_size,
            model.config.sample_size,
            generator=torch.manual_seed(0),
        )
        image = image.to(torch_device)
        with torch.no_grad():
            output = model(image, sample_posterior=True, generator=generator).sample

        output_slice = output[0, -1, -3:, -3:].flatten().cpu()

        # Since the VAE Gaussian prior's generator is seeded on the appropriate device,
        # the expected output slices are not the same for CPU and GPU.
        if torch_device == "mps":
            expected_output_slice = torch.tensor(
                [
                    -4.0078e-01,
                    -3.8323e-04,
                    -1.2681e-01,
                    -1.1462e-01,
                    2.0095e-01,
                    1.0893e-01,
                    -8.8247e-02,
                    -3.0361e-01,
                    -9.8644e-03,
                ]
            )
        elif generator_device == "cpu":
            expected_output_slice = torch.tensor(
                [
                    -0.1352,
                    0.0878,
                    0.0419,
                    -0.0818,
                    -0.1069,
                    0.0688,
                    -0.1458,
                    -0.4446,
                    -0.0026,
                ]
            )
        else:
            expected_output_slice = torch.tensor(
                [
                    -0.2421,
                    0.4642,
                    0.2507,
                    -0.0438,
                    0.0682,
                    0.3160,
                    -0.2018,
                    -0.0727,
                    0.2485,
                ]
            )

>       self.assertTrue(torch_all_close(output_slice, expected_output_slice, rtol=1e-2))

tests\models\autoencoders\test_models_autoencoder_kl.py:171:
E           AssertionError: Max diff is absolute 0.000513467937707901. Diff tensor is tensor([4.3437e-05, 6.3509e-05, 2.4503e-04, 5.1347e-04, 3.8743e-05, 2.0981e-04,
E                   1.7959e-04, 1.4303e-04, 2.2203e-06]).

tests\testing_utils.py:129: AssertionError
------------------------------------------------------------------- Captured stderr call -------------------------------------------------------------------
An error occurred while trying to fetch fusing/autoencoder-kl-dummy: fusing/autoencoder-kl-dummy does not appear to have a file named diffusion_pytorch_model.safetensors.
Defaulting to unsafe serialization. Pass `allow_pickle=False` to raise an error instead.
================================================================= short test summary info ==================================================================
FAILED tests/models/autoencoders/test_models_autoencoder_kl.py::AutoencoderKLTests::test_layerwise_casting_memory - AssertionError: False is not true
FAILED tests/models/autoencoders/test_models_autoencoder_kl.py::AutoencoderKLTests::test_lora_adapter_wrong_metadata_raises_error - safetensors_rust.SafetensorError: Error while serializing: I/O error: The requested operation cannot be performed on a file with a user-mapped section o...
FAILED tests/models/autoencoders/test_models_autoencoder_kl.py::AutoencoderKLTests::test_output_pretrained - AssertionError: Max diff is absolute 0.000513467937707901. Diff tensor is tensor([4.3437e-05, 6.3509e-05, 2.4503e-04, 5.1347e-04, 3.8743e-05, 2.0981e-04,
======================================================== 3 failed, 50 passed, 22 skipped in 19.32s =========================================================

where two of them are I/O serialization error, a memory check error (I read the comments that this error usually passes on Ampere and Ada environment, which both are not my current environment), and a slight output difference in test_output_pretrained

bconstantine · 2025-12-13T08:01:17Z

@sayakpaul Also with the current checks, it looks like there is coding style error. Can you help us run the automatic style correction?

sayakpaul · 2025-12-13T11:59:15Z

@bot /style

github-actions · 2025-12-13T11:59:35Z

Style fix is beginning .... View the workflow run here.

sayakpaul · 2025-12-13T12:02:49Z

The style bot cannot automatically do it. See:
https://github.com/huggingface/diffusers/actions/runs/20191712208/job/57970027213

I would recommend the following:

Create a fresh Python env.
Run pip install - ".[style]" from the root of the repository directory.
Run make style && make quality.

Aki-07 · 2025-12-13T12:12:31Z

Thanks for the pointer @sayakpaul

sayakpaul · 2025-12-13T12:18:45Z

@Aki-07 @bconstantine I ran those failing tests on my end with this branch and also on main. I didn't notice any failures.

bconstantine · 2025-12-13T12:26:12Z

@sayakpaul thankyou for testing! Glad to hear no failures on your environment end.

Aki-07 · 2025-12-13T13:20:18Z

Hey @sayakpaul, the WanVACE LoRA failures came from the hook offloading immediately when it was attached. It saved the weights before LoRA was added, then put them back later, so the adapters never took effect. I removed that eager offload so the first offload happens after adapters are loaded. Would need your help to re run the pipelines

src/diffusers/hooks/group_offloading.py

DN6 · 2025-12-15T11:05:12Z

src/diffusers/hooks/group_offloading.py

+        return send_to_device(kwargs, self.group.onload_device, non_blocking=self.group.non_blocking)

-        return args, kwargs
+    def _is_group_on_device(self) -> bool:


Duplicated method names
https://github.com/huggingface/diffusers/pull/12747/changes#diff-3c991fd8823746cd2455c0fa1334ecc07f407291d31775d617967e83db3c3129R361

We have erased the duplicate method names for _is_group_on_device

src/diffusers/hooks/group_offloading.py

DN6 · 2025-12-15T12:06:05Z

src/diffusers/hooks/group_offloading.py


+    # Ensure the top-level module also has a group_offloading hook so hook presence checks pass,
+    # even when it holds no parameters/buffers itself.
+    if config.stream is None:


Why do we need this?

even when all real groups sit in child modules, the root needs a group_offloading hook so the model stays marked as offloaded. That keeps the guardrails working (_is_group_offload_enabled still blocks .to()/.cuda() and conflicting offloads, and reapply/remove logic finds the hook). Without it, a wrapper with no params would look un-offloaded and could be moved or re-offloaded into a bad state

bconstantine · 2025-12-17T04:48:44Z

Hi @DN6 @sayakpaul We’ve updated the fix according to the review. Could you take a quick look and share any feedback when you have a moment? Thank you in advance!

Aki-07 · 2025-12-22T07:41:15Z

Hey @DN6 @sayakpaul , As mentioned above, have fixed the comments. Could you help us guide on to the next steps?

sayakpaul

Thanks for all the work on this PR. There are a couple of things that feel quite confusing to me. So, I would appreciate some explanations.

src/diffusers/models/autoencoders/autoencoder_kl_wan.py

src/diffusers/pipelines/pipeline_utils.py

sayakpaul · 2025-12-24T05:21:19Z

tests/hooks/test_group_offloading.py



-# Model with only standalone computational layers at top level
-class DummyModelWithStandaloneLayers(ModelMixin):


Why is this being deleted?

Rest of the diffs in this testing script are a bit difficult to follow honestly. Could we keep this cleaner?

Thanks for pointing this out. The class was not intentionally deleted. From the git history, this shows up as removed due to branch history / rebase artifacts while integrating changes (rather than a deliberate change to the test itself), which makes the diff noisier than it should be. I’m cleaning this up now: I’ll restore that block and reorganize the commits so the test diffs are more focused/atomic and easier to review.

sayakpaul · 2025-12-24T05:29:24Z

src/diffusers/hooks/group_offloading.py

            pinned_dict = None

-    def _transfer_tensor_to_device(self, tensor, source_tensor, default_stream):
+    def _transfer_tensor_to_device(self, tensor, source_tensor, default_stream=None):


Why do we have to set the default of default_stream?

I made it optional because the non-stream path calls _process_tensors_from_modules without a stream, there is nothing to record in that case, and record_stream is gated. None is a safety net for the record call, and it saves passing a placeholder from those call sites. If you prefer the stricter signature, I can keep it required and pass None explicitly where we don’t use streams. please do correct me thru my understanding if this is required to change

Let's stick to the existing implementation in this case i.e., a stricter signature.

Let's stick to a stricter signature.

src/diffusers/hooks/group_offloading.py

sayakpaul · 2025-12-24T05:46:19Z

src/diffusers/hooks/group_offloading.py

            _apply_group_offloading_hook(module, unmatched_group, config=config)
        else:
            _apply_lazy_group_offloading_hook(module, unmatched_group, config=config)
+    elif config.stream is None and config.offload_to_disk_path is None:


This seems unnecessary. Explain?

originally added the empty root hook to tag the top module as offloaded when everything else was matched, but it did not change behaviour, the child hooks already mark the model as group-offloaded and the guardrails rely on those. It just added an empty group and potential extra files, so have removed it to simplify. Functionally nothing depends on it.

sayakpaul · 2025-12-24T05:46:35Z

src/diffusers/hooks/group_offloading.py

            low_cpu_mem_usage=config.low_cpu_mem_usage,
            onload_self=True,
-            group_id=name,
+            group_id=f"{config.module_prefix}{name}",


What's happening here?

It is the same thing as above, we prefix group_id with the parent name to avoid collisions (ids) when recursing into block_modules. Root stays empty to preserve existing ids, the prefix only appears when descending into children.

…feature/group-offload-pinning

Aki-07 · 2026-01-04T18:26:53Z

@sayakpaul thank you for all ur comments, sorry for the delay in resolving. All have been answered, Please do let us know ur review

sayakpaul · 2026-01-08T11:03:17Z

src/diffusers/hooks/group_offloading.py

+    if isinstance(pin_groups, str) and pin_groups in VALID_PIN_GROUPS:
+        return pin_groups
+    raise ValueError(
+        f"`pin_groups` must be None, {', '.join(repr(v) for v in sorted(VALID_PIN_GROUPS))}, or a callable."
+    )


Suggested change

if isinstance(pin_groups, str) and pin_groups in VALID_PIN_GROUPS:

return pin_groups

raise ValueError(

f"`pin_groups` must be None, {', '.join(repr(v) for v in sorted(VALID_PIN_GROUPS))}, or a callable."

)

elif isinstance(pin_groups, str) and pin_groups not in VALID_PIN_GROUPS:

raise ValueError(

f"`pin_groups` must be None, {', '.join(repr(v) for v in sorted(VALID_PIN_GROUPS))}, or a callable."

)

return pin_groups

sayakpaul · 2026-01-08T11:04:34Z

src/diffusers/hooks/group_offloading.py

            pinned_dict = None

-    def _transfer_tensor_to_device(self, tensor, source_tensor, default_stream):
+    def _transfer_tensor_to_device(self, tensor, source_tensor, default_stream=None):


Let's stick to the existing implementation in this case i.e., a stricter signature.

sayakpaul · 2026-01-08T11:05:06Z

src/diffusers/hooks/group_offloading.py

            pinned_dict = None

-    def _transfer_tensor_to_device(self, tensor, source_tensor, default_stream):
+    def _transfer_tensor_to_device(self, tensor, source_tensor, default_stream=None):


Let's stick to a stricter signature.

sayakpaul · 2026-01-08T11:07:01Z

src/diffusers/hooks/group_offloading.py


    def initialize_hook(self, module: torch.nn.Module) -> torch.nn.Module:
-        if self.group.offload_leader == module:
+        # For disk offload we materialize the safetensor files upfront so callers can inspect them immediately.


Can you clarify this scenario in the comments as well? And provide a small example that justifies this change?

sayakpaul · 2026-01-08T11:08:41Z

src/diffusers/hooks/group_offloading.py

-        # If the current module is the onload_leader of the group, we onload the group if it is supposed
-        # to onload itself. In the case of using prefetching with streams, we onload the next group if
-        # it is not supposed to onload itself.


(nit): let's not get rid of the important comments.

sayakpaul · 2026-01-08T11:09:31Z

src/diffusers/hooks/group_offloading.py

+                not self.group.onload_self
+                and self.group.stream is not None
+                and not should_onload_next_group
+                and not self.group.record_stream


Could I get a clarification on why this condition needs to be modified?

did not change the condition, those same four checks were already present in both branches. I consolidated them into one place to avoid duplication. We still only sync when the group did not onload itself, we are using a stream, there is no pending prefetch, and record_stream is not handling lifetime tracking.

sayakpaul · 2026-01-08T11:12:54Z

src/diffusers/hooks/group_offloading.py

+        if self.group.offload_leader == module:
+            self.group.offload_()
+        return output



This part of the diff reads very confusing to me and hence, a bit hard to confidently review. It seems to me, post_forward() was just brought up, _send_kwargs_to_device() was added (and I am not sure why) amongst other things. Possible to have a cleaner diff?

Thanks for the flag, I simplified the diff ( have removed _send_kwargs_to_device and the kwargs handling is back inline in pre_forward as before )

sayakpaul · 2026-01-08T11:16:37Z

src/diffusers/models/modeling_utils.py

+        Args:
+            pin_groups (`"first_last"` | `"all"` | `Callable`, *optional*):
+                Optionally keep selected groups on the onload device permanently. See
+                [`~hooks.group_offloading.apply_group_offloading`] for details.


Are we just documenting pin_groups here? If so, we should remove that from here as apply_group_offloading() should already cover it:

https://github.com/bconstantine/diffusers/blob/335dca80fb2cdbdab7c2daa298c3ca934d7107b6/src/diffusers/models/modeling_utils.py#L542

removed redundant docstring

sayakpaul · 2026-01-08T11:17:28Z

src/diffusers/models/autoencoders/autoencoder_kl_wan.py

    # keys toignore when AlignDeviceHook moves inputs/outputs between devices
    # these are shared mutable state modified in-place
    _skip_keys = ["feat_cache", "feat_idx"]
+    _group_offload_block_modules = ["quant_conv", "post_quant_conv", "encoder", "decoder"]


Let's also add a comment on how these modules were chosen to be included here.

sure, added comment

sayakpaul · 2026-01-08T11:21:49Z

@seed93 would you like to test it?

Aki-07 · 2026-01-08T16:21:34Z

Thanks again @sayakpaul for the detailed review! Have addressed all the points

bconstantine and others added 5 commits November 28, 2025 16:02

created test for pinning first and last block on device

4fc12e2

fix comments in tests for cleaner code

93e6d31

Support explicit block modules in group offloading

3455019

Add pinning support to group offloading hooks

9c3c14f

Expose group offload pinning options in API

3b3813d

Aki-07 force-pushed the feature/group-offload-pinning branch from 7e50d90 to 3b3813d Compare November 29, 2025 14:03

bconstantine mentioned this pull request Dec 2, 2025

How about forcing the first and last block on device when groupoffloading is used? #11966

Open

sayakpaul requested review from DN6 and sayakpaul December 2, 2025 15:29

sayakpaul mentioned this pull request Dec 2, 2025

The Diffusers MVP 🚀 #12635

Open

bconstantine and others added 4 commits December 10, 2025 11:01

created test for pinning first and last block on device

b9e0994

Support explicit block modules in group offloading

a99755a

Expose group offload pinning options in API

ffad316

removed deprecated flag pin_first_last

33d8b52

Aki-07 force-pushed the feature/group-offload-pinning branch from 6d96002 to 33d8b52 Compare December 10, 2025 06:06

sayakpaul reviewed Dec 10, 2025

View reviewed changes

bconstantine and others added 5 commits December 11, 2025 07:08

created test for pinning first and last block on device

ed8a97a

Support explicit block modules in group offloading

de38128

Expose group offload pinning options in API

c72ddbc

removed deprecated flag pin_first_last

1cd3355

Address review feedback for group offload pinning

1194a83

Aki-07 force-pushed the feature/group-offload-pinning branch from 6f5887e to 1194a83 Compare December 11, 2025 01:53

Apply style fixes

3ef894d

Apply style fixes after lint

8da39a3

Avoid eager offload before adapters load

6c5e41a

DN6 reviewed Dec 15, 2025

View reviewed changes

bconstantine and others added 4 commits December 16, 2025 00:06

removed apply_block_offloading_to_submodule

d08d988

normalize pin groups changed to validate pin groups

61b3662

added default_stream

0cbd079

Eagerly write disk offload tensors for safetensor checks

53659d8

sayakpaul reviewed Dec 24, 2025

View reviewed changes

bconstantine added 3 commits December 27, 2025 01:10

Merge branch 'main' of https://github.com/huggingface/diffusers into …

6a98592

…feature/group-offload-pinning

restore original test

af61b9c

restored feature/group-offload-tests

2e8f538

Aki-07 force-pushed the feature/group-offload-pinning branch from 8403860 to 2e8f538 Compare January 4, 2026 16:59

Address review feedback for group offload pinning

335dca8

sayakpaul reviewed Jan 8, 2026

View reviewed changes

Merge branch 'main' into feature/group-offload-pinning

581f051

Address review feedback on group offloading

fd69611



		# Model with only standalone computational layers at top level
		class DummyModelWithStandaloneLayers(ModelMixin):

Feature/group offload pinning #12747

Are you sure you want to change the base?

Feature/group offload pinning #12747

Conversation

Aki-07 commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

sayakpaul commented Dec 2, 2025

Uh oh!

sayakpaul commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Aki-07 commented Dec 10, 2025

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bconstantine commented Dec 10, 2025

Uh oh!

sayakpaul commented Dec 11, 2025

Uh oh!

github-actions bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Dec 11, 2025

Uh oh!

bconstantine commented Dec 13, 2025

Uh oh!

bconstantine commented Dec 13, 2025

Uh oh!

sayakpaul commented Dec 13, 2025

Uh oh!

github-actions bot commented Dec 13, 2025

Uh oh!

sayakpaul commented Dec 13, 2025

Uh oh!

Aki-07 commented Dec 13, 2025

Uh oh!

sayakpaul commented Dec 13, 2025

Uh oh!

bconstantine commented Dec 13, 2025

Uh oh!

Aki-07 commented Dec 13, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bconstantine commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Aki-07 commented Dec 22, 2025

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Aki-07 commented Nov 29, 2025 •

edited

Loading

sayakpaul commented Dec 9, 2025 •

edited

Loading

github-actions bot commented Dec 11, 2025 •

edited

Loading

bconstantine commented Dec 17, 2025 •

edited

Loading