Conversation
Rationale --------- select_attn_backend previously returned "flash_attention_2" whenever flash_attn was installed and the device was CUDA, without checking whether the target model class actually declares FA2 support via HF's dispatcher. For BertForMaskedLM (Geneformer) that silently routed the model down a code path transformers can't actually dispatch, so the "Loading ... in bfloat16 for flash_attention_2 compatibility" warning wasn't just cosmetic noise — it flagged a branch that couldn't work. The helical integration-tests job doesn't install flash_attn, so this gap was invisible in CI. Plan ---- * Add a supports_fa2 parameter to select_attn_backend. Only models whose class declares _supports_flash_attn / _supports_flash_attn_2 can take the FA2 branch; others (Geneformer) fall back to sdpa. * Pass supports_fa2=True from HelixmRNA. Leave Geneformer on the default (False) and annotate the call site so callers who want FA2 for BertForMaskedLM know they have to wire flash_attn directly. * Drop the now-unreachable bfloat16-for-FA2 warnings from Geneformer; the sdpa fallback path never triggers them. * Add a flash-attn-integration CI job that installs flash_attn and smoke-tests both paths: Geneformer (regression guard — must still load on sdpa even with flash_attn present) and HelixmRNA (must actually run on the FA2 branch).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.