[Frontend] enable gqa and flash attention fusion for prefill phase by GuoningHuang · Pull Request #677 · buddy-compiler/buddy-mlir

GuoningHuang · 2026-01-27T08:29:31Z

Summary

This PR introduces the fusion logic for Grouped Query Attention (GQA) combined with Flash Attention, specifically optimized for the prefill phases.
The prefill performance of prefill improve from 45 tokens/s to 49 tokens/s when runing without numcl:
before:

After:

GuoningHuang added 3 commits January 27, 2026 16:21

[Frontend] apply gqa_attention_fusion for prefill phase

82f9446

[Test] update test for gqa and flash attention fusion

d7e3832

[Examples] apply gqa and flash attention fusion for prefill phase

0859a29

GuoningHuang requested a review from zhanghb97 as a code owner January 27, 2026 08:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend] enable gqa and flash attention fusion for prefill phase#677

[Frontend] enable gqa and flash attention fusion for prefill phase#677
GuoningHuang wants to merge 3 commits intobuddy-compiler:mainfrom
GuoningHuang:gqa-prefill

GuoningHuang commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GuoningHuang commented Jan 27, 2026

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant