Skip to content

Conversation

@zucchini-nlp
Copy link
Member

What does this PR do?

Fixes #42968, #42641

As discussed, this PR deprecated different dtypes per backbone and falls back to the main dtype if a user is requesting different dtypes.

There is no clean way to support different dtypes when the backbone can have a task head. We either cast hidden states to task dtype in all model files, or we need a workaround to infer task head and force dtype on it when loading. So the better solution is to deprecate dict dtypes, apparently it is not used much

@zucchini-nlp
Copy link
Member Author

run-slow: llava, llava_next, llava_next_video, llava_onevision

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/llava", "models/llava_next", "models/llava_next_video", "models/llava_onevision"]
quantizations: []

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Comment on lines 869 to 874
logger.warning_once(
f"The dtype for {sub_config_key}={sub_dtype} is different from the main dtype={main_dtype}. "
"Setting different dtypes per backbone model might cause device errors downstream, setting the "
f"dtype={main_dtype} for all modules. Note, using different dtypes per module is deprecated and will "
"be removed in future versions."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that in the future it will raise an exception, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, after a few released we can switch to an error

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that if a user sets dtype="auto", all sub modules will be forcibly cast to the specified main_dtype?

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do that, then we need to plainly remove dtype being passed as a dict as well!

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, if there is not many models with different dtypes in subconfigs, it's indeed better to only show the warning when explicitly passing a dict, and silently skipping otherwise!

@zucchini-nlp zucchini-nlp merged commit 7918b15 into huggingface:main Jan 13, 2026
25 checks passed
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* deprecate dtypes per sub config

* fix test

* Update tests/utils/test_modeling_utils.py

Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

* address comment

---------

Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LlavaNextForConditionalGeneration forward pass broken

6 participants