feat(nnx): add Grouped Query Attention (GQA) support #5180

ayulockedin · 2026-01-07T23:23:50Z

What does this PR do?

This PR adds support for Grouped Query Attention (GQA) to nnx.dot_product_attention.

Previously, nnx.dot_product_attention required the number of heads in Query, Key, and Value to be identical. This caused a shape mismatch error when trying to use GQA configurations (where multiple Query heads share a single Key/Value head).

Changes Implemented:

Added broadcasting logic in dot_product_attention_weights to repeat Key heads to match Query heads.
Added broadcasting logic in dot_product_attention to repeat Value heads to match the Attention Weights.
Added a validation check to ensure query_heads is divisible by key_heads (raising a clear ValueError if not).
Added a new test file tests/nnx/nn/gqa_test.py covering valid GQA shapes and invalid configuration handling.

This change brings nnx into parity with jax.nn.dot_product_attention, enabling modern architectures (like Llama 3) to be implemented in NNX.

Fixes #5177

Checklist

This PR fixes a minor issue (e.g.: typo or small bug) or improves the docs (you can dismiss the other checks if that's the case).
This change is discussed in a Github issue/discussion (please add a link).
The documentation and docstrings adhere to the documentation guidelines.
This change includes necessary high-coverage tests. (No quality testing = no merge!)

ayulockedin · 2026-01-08T15:18:13Z

Hi @cgarciae . Could you please take a look at this PR when you have a moment? Thanks!

samanklesaria · 2026-01-08T15:24:17Z

Hi @ayulockedin - thanks for the PR! Would you be able to add some tests explicitly comparing the results of nnx.dot_product_attention to jax.nn.dot_product_attention to make sure they produce the same results for some random inputs? You'll need to give the nnx one a module argument so it doesn't just call the jax one.

ayulockedin · 2026-01-08T15:25:54Z

@samanklesaria on it thx

ayulockedin · 2026-01-08T15:46:54Z

@samanklesaria Thanks for the review! I've added test_gqa_parity_with_jax to tests/nnx/nn/gqa_test.py.

It forces the internal NNX python implementation (by passing a dummy module) and compares the output against jax.nn.dot_product_attention (where I manually broadcast the GQA inputs) to ensure numerical equivalence. All tests are passing.

flax/nnx/nn/attention.py

ayulockedin · 2026-01-08T20:27:03Z

@samanklesaria Good catch. I've refactored this so the rank assertions happen before broadcasting, which allowed me to simplify the logic. Looks good now??

flax/nnx/nn/attention.py

samanklesaria · 2026-01-08T21:50:19Z

@ayulockedin Looks like pre-commit hooks are failing. Make sure you've set up pre-commit hooks as in https://flax.readthedocs.io/en/stable/contributing.html

ayulockedin · 2026-01-08T22:02:57Z

Agreed. I removed the extra shape checks. The code now falls back to jax.nn.dot_product_attention whenever dropout_rate == 0 and module is None. @samanklesaria thx alot for reviewing and helping me :)

samanklesaria · 2026-01-09T18:39:05Z

@ayulockedin once all the tests pass I'll give it another look, but so far I don't see any major issues.

ayulockedin · 2026-01-09T23:31:54Z

@samanklesaria there was another small typo error which made the checks fail but i have fixed that with the recent commit should be good to run the checks again thx

flax/nnx/nn/attention.py

tests/nnx/nn/gqa_test.py

samanklesaria

Looks good to me!

ayulockedin · 2026-01-22T20:08:24Z

@samanklesaria Just a heads up: I've opened Issue #5198 to track the follow-up work for the MultiHeadAttention module updates.

Once this functional PR lands, I plan to tackle that issue to bring full GQA parity to the class API (adding num_key_value_heads to init). Just wanted to link the two contextually so we have a roadmap!

cgarciae · 2026-01-22T21:50:12Z

@ayulockedin @samanklesaria lets move the tests to attention_test.py.

ayulockedin · 2026-01-22T23:26:26Z

Done! I've moved the GQA tests to attention_test.py (added as the TestGQADotProductAttention class) and removed the separate test file.

I also ran the pre-commit hooks and squashed everything into a single clean commit. Ready for review! @cgarcia

ayulockedin mentioned this pull request Jan 7, 2026

Support QGA in nnx dot_product_attention #5177

Open

samanklesaria mentioned this pull request Jan 8, 2026

Adding transformer encoder and decoder layers to flax source as in pytorch #5176

Open

ayulockedin force-pushed the feat/nnx-gqa-support branch from 35eaf6f to 9fb1a8f Compare January 8, 2026 15:45

ayulockedin force-pushed the feat/nnx-gqa-support branch from 9fb1a8f to 7ac9f52 Compare January 8, 2026 19:03

cgarciae assigned samanklesaria Jan 8, 2026

samanklesaria reviewed Jan 8, 2026

View reviewed changes

flax/nnx/nn/attention.py Outdated Show resolved Hide resolved

samanklesaria reviewed Jan 8, 2026

View reviewed changes

flax/nnx/nn/attention.py Outdated Show resolved Hide resolved

samanklesaria reviewed Jan 8, 2026

View reviewed changes

flax/nnx/nn/attention.py Outdated Show resolved Hide resolved

samanklesaria force-pushed the feat/nnx-gqa-support branch from 7233b20 to 2ba1a78 Compare January 12, 2026 16:17

ayulockedin force-pushed the feat/nnx-gqa-support branch from 2ba1a78 to c3a4286 Compare January 12, 2026 16:25

samanklesaria reviewed Jan 12, 2026

View reviewed changes

flax/nnx/nn/attention.py Outdated Show resolved Hide resolved

samanklesaria reviewed Jan 12, 2026

View reviewed changes

tests/nnx/nn/gqa_test.py Outdated Show resolved Hide resolved

samanklesaria reviewed Jan 12, 2026

View reviewed changes

tests/nnx/nn/gqa_test.py Outdated Show resolved Hide resolved

samanklesaria approved these changes Jan 12, 2026

View reviewed changes

ayulockedin mentioned this pull request Jan 22, 2026

feat(nnx): Expose GQA support (num_key_value_heads) in MultiHeadAttention #5198

Open

cgarciae approved these changes Jan 22, 2026

View reviewed changes

cgarciae added the pull ready label Jan 22, 2026

cgarciae removed the pull ready label Jan 22, 2026

feat(nnx): add Grouped Query Attention (GQA) support

4daedfe

ayulockedin force-pushed the feat/nnx-gqa-support branch from e2294b6 to 4daedfe Compare January 22, 2026 23:24

cgarciae added the pull ready label Jan 23, 2026

feat(nnx): add Grouped Query Attention (GQA) support #5180

Are you sure you want to change the base?

feat(nnx): add Grouped Query Attention (GQA) support #5180

Conversation

ayulockedin commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist

Uh oh!

ayulockedin commented Jan 8, 2026

Uh oh!

samanklesaria commented Jan 8, 2026

Uh oh!

ayulockedin commented Jan 8, 2026

Uh oh!

ayulockedin commented Jan 8, 2026

Uh oh!

Uh oh!

ayulockedin commented Jan 8, 2026

Uh oh!

Uh oh!

Uh oh!

samanklesaria commented Jan 8, 2026

Uh oh!

ayulockedin commented Jan 8, 2026

Uh oh!

samanklesaria commented Jan 9, 2026

Uh oh!

ayulockedin commented Jan 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

samanklesaria left a comment

Choose a reason for hiding this comment

Uh oh!

ayulockedin commented Jan 22, 2026

Uh oh!

cgarciae commented Jan 22, 2026

Uh oh!

ayulockedin commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ayulockedin commented Jan 7, 2026 •

edited

Loading