Skip to content

Conversation

@zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Dec 22, 2025

What does this PR do?

Continues on #42958 and makes sure the opposite case, when config has a non-default value saved but the user wants to set it back to a default boolean, is also covered

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp
Copy link
Member Author

run-slow: llama, gemma2, qwen2, bart, t5, whisper

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/bart", "models/gemma2", "models/llama", "models/qwen2", "models/t5", "models/whisper"]
quantizations: []

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

Model CI Report

❌ Failed tests

  • bart:
    tests/models/bart/test_modeling_bart.py::BartModelIntegrationTests::test_xsum_config_generation_params

Comment on lines +1126 to +1127
if key == "use_cache":
continue # common key for most models
Copy link
Member Author

@zucchini-nlp zucchini-nlp Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, could we remove use_cache as a key from config and instead keep it only in generation_config? Imo it isn't really a model attribute

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm unsure. I understand the point but that would be hugely breaking for a lot of users and external libraries - on the other hand we could include this for v5 if you feel motivated 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will definitely break stuff 😿 Just a thought for now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, this is a workaround we have to not throw errors that model config contains generation params. We won't throw errors anyway if use_cache is in model config signature but unfortunately many don't have it documented

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, maybe at some point we can 👀

Comment on lines 754 to 755
if getattr(self.model, "config", None) is not None:
self.model.config.use_cache = self.args.use_cache
if getattr(self.model, "generation_config", None) is not None:
self.model.generation_config.use_cache = self.args.use_cache
Copy link
Member Author

@zucchini-nlp zucchini-nlp Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model config might not contain use_cache unless it is in config's own init. We don't assign generation attributes anymore in base config class

So checking generation_config is more reliable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the current logic here then? kwarg > config > generation config? Seems like another case of bad relations here 😭

It does feel a bit out of place tho to me. I'd like to keep this PR strictly to the default values, this semi progresses the point of removing reliance on use_cache in the config. On another note, we use the check_model_inputs decorator with the config use_cache so it might be hard to disentangle this completely

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are nor removing use_cache yet in this PR. I hacked by the above check that doesn't include use_cache in generation params

I will revert it back, forgot to do so after testing some stuff

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic stays as is, so yes, kwarg > config > generation config

@zucchini-nlp zucchini-nlp changed the title [WIP] Generation config boolean defaults Generation config boolean defaults Dec 23, 2025
@zucchini-nlp zucchini-nlp requested review from ArthurZucker and vasqu and removed request for vasqu December 23, 2025 08:27
Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, I'd be pro changing the use_cache changes (at least for now) - see my last comment.

Also do we still need

# Due to some values being boolean and not `None`, we need additional logic to overwrite
# them explicitly (`defaults_only=False`) on the condition that it's only a previous
# default value
default_generation_config = GenerationConfig()
generation_config.update(
**{
k: v
for k, v in self.generation_config.to_dict().items()
if isinstance(v, bool)
and hasattr(default_generation_config, k)
and getattr(generation_config, k, None) == getattr(default_generation_config, k)
}
)

I added this only due to these not being none within their defaults, but now we should update them properly without that workaround, no?

Last note, semi-unrelated, let's also keep track of https://github.com/huggingface/transformers/pull/42702/files#r2635806800 in another PR cc @ebezzam

Comment on lines +1126 to +1127
if key == "use_cache":
continue # common key for most models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm unsure. I understand the point but that would be hugely breaking for a lot of users and external libraries - on the other hand we could include this for v5 if you feel motivated 🤔

generation_mode = GenerationMode.CONSTRAINED_BEAM_SEARCH
elif self.num_beams is None or self.num_beams == 1:
if not self.do_sample:
if self.do_sample is not True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.do_sample is not True:
if self.do_sample is False:

super nit, seen this multiple times

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it should be if not self.do_sample haha

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not do_sample will result in false positives when do_sample is None 😢 Gotta be explicitly checking for not True, since the value of None is same as having False, i.e. the new default value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a small comment then, that's such a weird case lol.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes really relevant? Seems unrelated to me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some tests were failing without it because the re-loaded model has a incorrect config type. And we couldn't find generation keys in it
It was luckily passing in main branch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then let's keep it, not wrong was just surprised

Comment on lines 754 to 755
if getattr(self.model, "config", None) is not None:
self.model.config.use_cache = self.args.use_cache
if getattr(self.model, "generation_config", None) is not None:
self.model.generation_config.use_cache = self.args.use_cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the current logic here then? kwarg > config > generation config? Seems like another case of bad relations here 😭

It does feel a bit out of place tho to me. I'd like to keep this PR strictly to the default values, this semi progresses the point of removing reliance on use_cache in the config. On another note, we use the check_model_inputs decorator with the config use_cache so it might be hard to disentangle this completely

@zucchini-nlp zucchini-nlp requested a review from vasqu January 8, 2026 12:31
Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, could you just double check if we still need

# Due to some values being boolean and not `None`, we need additional logic to overwrite
# them explicitly (`defaults_only=False`) on the condition that it's only a previous
# default value
default_generation_config = GenerationConfig()
generation_config.update(
**{
k: v
for k, v in self.generation_config.to_dict().items()
if isinstance(v, bool)
and hasattr(default_generation_config, k)
and getattr(generation_config, k, None) == getattr(default_generation_config, k)
}
)

Now that these values are defaulting to None as well, I suspect we might not need it anymore

Comment on lines +1126 to +1127
if key == "use_cache":
continue # common key for most models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, maybe at some point we can 👀

generation_mode = GenerationMode.CONSTRAINED_BEAM_SEARCH
elif self.num_beams is None or self.num_beams == 1:
if not self.do_sample:
if self.do_sample is not True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a small comment then, that's such a weird case lol.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then let's keep it, not wrong was just surprised

@zucchini-nlp
Copy link
Member Author

LGTM, could you just double check if we still need

my bad, didn't see the comment on it. Not needed!

@github-actions
Copy link
Contributor

github-actions bot commented Jan 8, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: bart, clvp

@zucchini-nlp zucchini-nlp enabled auto-merge (squash) January 8, 2026 13:24
@zucchini-nlp zucchini-nlp merged commit 93d7aff into huggingface:main Jan 8, 2026
25 checks passed
Comment on lines -387 to +392
self.num_return_sequences = kwargs.pop("num_return_sequences", 1)
self.output_attentions = kwargs.pop("output_attentions", False)
self.output_hidden_states = kwargs.pop("output_hidden_states", False)
self.output_scores = kwargs.pop("output_scores", False)
self.output_logits = kwargs.pop("output_logits", False)
self.return_dict_in_generate = kwargs.pop("return_dict_in_generate", False)
self.num_return_sequences = kwargs.pop("num_return_sequences", None)
self.output_attentions = kwargs.pop("output_attentions", None)
self.output_hidden_states = kwargs.pop("output_hidden_states", None)
self.output_scores = kwargs.pop("output_scores", None)
self.output_logits = kwargs.pop("output_logits", None)
self.return_dict_in_generate = kwargs.pop("return_dict_in_generate", None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm a bit lost into the motivation the PR and making those values to None (which is usually not a super good idea if they are boolean by essence) - could you give me a bit of context on the PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation was to fix a ton of edge cases when creating a generation config. In the past we had ugly workaround to allow users to change a generation param back to default if different value was saved on the hub (for ex #42762).

That was fixed in #42702 and I deleted a lot of code, and defined a clear priority list, Next step will be to disallow users to pass both: a generation config and kwargs. Instead ask to put all kwargs in config. The below is accepted rn and is a bad practice imo

conf = GenerationConfig(do_sample=False)
text = model.generate(input_ids, generation_config=conf, max_new_tokens=128, temperature=1.6)

SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* push

* fix a few tests

* why no tests are failing, suspicious

* one tiny fix

* maybe?

* typo

* clvp has incorrect config types

* nit

* docstring and revert unrelated trainer

* forgot to revert this one

* Update src/transformers/generation/configuration_utils.py

Co-authored-by: Anton Vlasjuk <[email protected]>

* delete the prev fix and add small comment

---------

Co-authored-by: Anton Vlasjuk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants