Check model inputs - hidden states#40994
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Hi @zucchini-nlp . Are you still working on this PR? Please let me know when it's ready, I'd like to verify it. Thanks! |
|
It should be working with VLMs, just need to fix CI and code style. I'll take it after finishing one huge PR I'm working on currently |
| @unittest.skip( | ||
| reason="This architecture seems to not compute gradients for the last vision-layernorm because the model uses hidden states pre-norm" | ||
| ) | ||
| def test_training_gradient_checkpointing(self): | ||
| pass | ||
|
|
||
| @unittest.skip( | ||
| reason="This architecture seems to not compute gradients for the last vision-layernorm because the model uses hidden states pre-norm" | ||
| ) | ||
| def test_training_gradient_checkpointing_use_reentrant(self): | ||
| pass | ||
|
|
||
| @unittest.skip( | ||
| reason="This architecture seems to not compute gradients for the last vision-layernorm because the model uses hidden states pre-norm" | ||
| ) | ||
| def test_training_gradient_checkpointing_use_reentrant_false(self): | ||
| pass | ||
|
|
There was a problem hiding this comment.
verified that prev and when model was released, it never used vision features after the norm. So the test was prob re-activated in the meanwhile when we had incorrect hidden_states from vision tower
|
I hate rebasing, looks like new models were added to |
Cyrilvallez
left a comment
There was a problem hiding this comment.
Nice! I would just like to change the name of the arg, i.e. I don't really understand what is post_ln_hiddens. Let's try to find something more clear, and let's also document what it is in the decorator definition!
Feel free to merge afterwards! 🤗
ArthurZucker
left a comment
There was a problem hiding this comment.
LGTM, but yeah let's rename the arg its not explicit!
|
[For maintainers] Suggested jobs to run (before merge) run-slow: aimv2, albert, apertus, arcee, aria, audio_spectrogram_transformer, aya_vision, bert, bert_generation, bitnet, blip, blip_2, blt, camembert, cohere, cohere2 |
* update all models * fix copies * skip aria tests * update other models * skip should be in test, not tester * i think this is more descriptive as a name * find and replace for new models
* update all models * fix copies * skip aria tests * update other models * skip should be in test, not tester * i think this is more descriptive as a name * find and replace for new models
What does this PR do?
Fixes #40964
In most vision models the
output.hidden_statesare the hiddens right after encoder blocks, i.e. before the last layernorm. Therefore for these modelsoutput.hidden_states != output.last_hidden_stateCurrently
check_model_inputsassumes that last hidden state is the correct one to return which is true for language models only. This PR adds a kwarg forcheck_model_inputswhich decides whether to replace last hidden state or notTBH, i think the way it is done in LMs is the ultimate correct version and we probably need to "break" vision models. But I can't think of a way to obtain pre-norm last hidden states which are needed for some VLMs