Skip to content

perf: Neon Paeth unfiltering (3bpp)#638

Draft
Sentimentron wants to merge 1 commit intoimage-rs:masterfrom
Sentimentron:paeth-neon-3bpp
Draft

perf: Neon Paeth unfiltering (3bpp)#638
Sentimentron wants to merge 1 commit intoimage-rs:masterfrom
Sentimentron:paeth-neon-3bpp

Conversation

@Sentimentron
Copy link
Copy Markdown
Contributor

As a counter-point to #632, this PR ports over the Neon code from libpng

Theoretical results from the micro-architecture simulator indicate an improvement in cycles (n.b. that cache hierarchy is not modelled):

Filter Arm Cortex A520 Arm Cortex X4
Paeth (3bpp) 24.44% 37.63%

Results from a Pixel 10:

Filter Arm Cortex A520 Arm Cortex X4 Baseline Cortex A520 Baseline Cortex X4 Neon on Cortex A520 Neon on Cortex-X4
Paeth (3bpp) 48.40% 2.54% 171.4 MiB/s 703.2 MiB/s 254.4 MiB/s 721.0 MiB/s

This is not really intended to merge, but offers another data point for portable_simd: some of the gain, but less than what's possible with per-architecture intrinsics.

@Sentimentron Sentimentron changed the title perf: neon Paeth unfiltering (3bpp) perf: Neon Paeth unfiltering (3bpp) Sep 2, 2025
@Sentimentron
Copy link
Copy Markdown
Contributor Author

Results on Apple Silicon:

unfilter/filter=Paeth/bpp=3
                        time:   [18.147 µs 18.191 µs 18.258 µs]
                        thrpt:  [641.84 MiB/s 644.20 MiB/s 645.76 MiB/s]
                 change:
                        time:   [+2.6091% +2.9728% +3.3392%] (p = 0.00 < 0.05)
                        thrpt:  [−3.2313% −2.8870% −2.5427%]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant