Skip to content

Optimize matrix multiplication performance test for Apple M3 Max.#9183

Open
alexreinking wants to merge 3 commits into
mainfrom
alexreinking/m3-matmul
Open

Optimize matrix multiplication performance test for Apple M3 Max.#9183
alexreinking wants to merge 3 commits into
mainfrom
alexreinking/m3-matmul

Conversation

@alexreinking

Copy link
Copy Markdown
Member

I ran a local parameter sweep to find the best parameters for the performance_matrix_multiply schedule on my Apple M3 Max. The comments have been updated to explain the schedules in a little more detail.

I searched a total of 256 configurations:

  • inner_tile_x: {1,2,3,4} * vec
  • inner_tile_y: {1,2,4,8}
  • tile_y divisors: {2,4,8,16}
  • tile_x divisors: {2,4,8,16}

Removed the "Uncomment to see the generated assembly" section because it doesn't compile (name collision with t, the benchmark time).

Checklist

  • Tests added or updated (not required for docs, CI config, or typo fixes)
  • Benchmarks are included here if the change is intended to affect performance.
  • Commits include AI attribution where applicable (see Code of Conduct)

@alexreinking alexreinking requested a review from shoaibkamil June 22, 2026 21:23
Comment thread test/performance/matrix_multiplication.cpp Outdated
Comment thread test/performance/matrix_multiplication.cpp

@shoaibkamil shoaibkamil left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; Only nits in my comments.

@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@2fad88f). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9183   +/-   ##
=======================================
  Coverage        ?   69.35%           
=======================================
  Files           ?      254           
  Lines           ?    78274           
  Branches        ?    18729           
=======================================
  Hits            ?    54290           
  Misses          ?    18471           
  Partials        ?     5513           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

A.set(mat_A);
B.set(mat_B);

// TODO: we really need a generic performance testing harness

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we create an issue to track?

for (int iy = 0; iy < matrix_size && halide_correct; iy++) {
for (int ix = 0; ix < matrix_size; ix++) {
halide_correct = halide_correct && (std::abs(output_ref(ix, iy) - output_halide(ix, iy)) < 0.001f);
bool halide_correct = [&] {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need a generic equalsish for floats too, but maybe that's part of our generic perf testing harness

@shoaibkamil shoaibkamil left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants