Skip to content

TTFT Performance Improvements#14

Closed
kurtis-b-1 wants to merge 20 commits intoamd:bring-in-the-llamafrom
kurtis-b-1:kurtis-b-1/ttft-improvements
Closed

TTFT Performance Improvements#14
kurtis-b-1 wants to merge 20 commits intoamd:bring-in-the-llamafrom
kurtis-b-1:kurtis-b-1/ttft-improvements

Conversation

@kurtis-b-1
Copy link
Copy Markdown
Contributor

Immediate optimization opportunities were found from profiling. This PR incorporates these optimizations.

Builds upon #8.

Added

  • Missing microkernel dims and kernel flags for AIE GEMM operator

Changed

  • Pipelined rms norm and eltwise mul for weighted rms norm operation
  • AIE operators refactored based on the Element-Wise Addition AIE operator as a template: SiLU, Element-Wise Mul, RMS Norm
  • GEMM kernel compilation changes based on whether prio_accuracy flag is True/False

Removed

PR Merge Checklist

  1. The PR is rebased on the latest devel commit and pointing to devel.
  2. Your PR has been reviewed and approved.
  3. All checks are passing.

@andrej
Copy link
Copy Markdown
Collaborator

andrej commented Nov 13, 2025

To make review easier (smaller diff, exclude changes from adding llama), you can change the base of this PR to PR #8 (branch bring-in-the-llama). Took me a sec to find this (you might already know how) but it's if you click "Edit" at the top of the page, then under the title text box select a different base. We'll merge #8 first, and I think then the base of this should automatically change to devel. I've just done the same for #10 and #11.

@kurtis-b-1 kurtis-b-1 changed the base branch from devel to bring-in-the-llama November 13, 2025 23:54
@andrej andrej force-pushed the bring-in-the-llama branch from 9da3796 to 28058d1 Compare November 14, 2025 20:46
@andrej andrej deleted the branch amd:bring-in-the-llama November 15, 2025 00:10
@andrej andrej closed this Nov 15, 2025
@kurtis-b-1 kurtis-b-1 deleted the kurtis-b-1/ttft-improvements branch November 17, 2025 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants