Skip to content

Conversation

@pallavisharma6802
Copy link

What does this PR do?

Fixes a crash when using device_map="auto" in environments where "cpu" is missing from the inferred max_memory dictionary.

Bug

When max_memory is not explicitly provided, accelerate.get_max_memory() may return a dictionary without a "cpu" key (depending on the backend/environment).
_get_device_map then unconditionally scales "cpu" memory, resulting in a KeyError.

Fix

  • Guard access to "cpu" in inferred_max_memory before applying the scaling factor.
  • Preserve existing behavior when "cpu" is present.

Tests

  • Added tests/utils/test_device_map_cpu_guard.py to ensure device_map="auto" does not crash when "cpu" is absent from max_memory.

Fixes #42994

Before submitting

  • Did you read the contributor guideline?
  • Did you write any new necessary tests?

Who can review?

@Cyrilvallez

@pallavisharma6802
Copy link
Author

Thanks for the report! This PR adds a defensive fix for the device_map="auto" crash and includes a regression test.
Happy to make any adjustments if needed.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When max_memory is not explicitly provided, accelerate.get_max_memory() may return a dictionary without a "cpu" key (depending on the backend/environment).
_get_device_map then unconditionally scales "cpu" memory, resulting in a KeyError.

sounds like accelerate should fix this no?

@pallavisharma6802
Copy link
Author

Good point! this assumption does originate from accelerate.get_max_memory().

That said, _get_device_map in Transformers currently treats the returned max_memory as an internal contract and applies additional logic on top of it, so guarding here avoids a hard crash regardless of how max_memory is produced (including older or external Accelerate versions).

I agree that Accelerate could also be hardened, and I’m happy to open a follow-up issue/PR there if that’s preferred. In the meantime, this keeps Transformers robust against missing "cpu" entries without changing behavior when it is present.

@Cyrilvallez
Copy link
Member

Humm, could you provide an example of a situation where cpu is not present in inferred_max_memory? I don't think it can be missing 🤔

@pallavisharma6802
Copy link
Author

Thanks for the question. That’s fair!

I don’t currently have a minimal standalone repro showing "cpu" missing directly from accelerate.get_max_memory() itself. What I observed was the KeyError happening in Transformers’ _get_device_map when using the inferred max_memory in my setup, which suggests "cpu" was absent at that stage. But I agree that if get_max_memory() guarantees a "cpu" entry by contract, then this guard may be too broad / misplaced.

If "cpu" is indeed invariant on the Accelerate side, I’m happy to adjust this PR to better reflect that contract (e.g. narrow the guard to the specific failing path or replace it with an assertion + update the test). I can also try to extract a minimal repro to confirm where "cpu" drops out (if it does). Let me know what you’d prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

quantized model saving failed

3 participants