portable_simd version of Avg (4bpp) by Sentimentron · Pull Request #641 · image-rs/image-png

Sentimentron · 2025-09-09T18:00:04Z

Implements a RGBA version of the Avg filter with portable_simd intrinsics.

CPU	Baseline	Result	Speedup
Arm Cortex A520	415.9 MiB/s	707.4 MiB/s	70.08%
Arm Cortex X4	2053.9 MiB/s	2334.9 MiB/s	13.68%
Apple Silicon M2	2053.9 MiB/s	2173.5 MiB/s	3.62%
AMD EPYC 7B13	2425.8 MiB/s	2150.8 MiB/s	-11.34%

Marked as draft until #632 is completed.

Sentimentron · 2025-09-09T18:58:01Z

AI disclosure: I wrote a original sliding-window portable_simd implementation of the Paeth filter (3bpp) and optimized it for best performance on the Cortex A520. I then used the Gemini family of LLMs provided by my employer to automatically adapt this code to the Avg filter from a written description, then optimize it to achieve the best possible code-generation and performance across all other micro-architectures in simulation. This PR is derived from that output, but includes documentation and other cleanups.

okaneco · 2025-09-12T19:40:18Z

There's another 4bpp case for the first row, where previous.is_empty(), not sure if you've tried that already.

image-png/src/filter.rs

Lines 612 to 624 in f33b850

    
           BytesPerPixel::Four => { 
        
               let mut prev = [0; 4]; 
        
               for chunk in current.chunks_exact_mut(4) { 
        
                   let new_chunk = [ 
        
                       chunk[0].wrapping_add(prev[0] / 2), 
        
                       chunk[1].wrapping_add(prev[1] / 2), 
        
                       chunk[2].wrapping_add(prev[2] / 2), 
        
                       chunk[3].wrapping_add(prev[3] / 2), 
        
                   ]; 
        
                   *TryInto::<&mut [u8; 4]>::try_into(chunk).unwrap() = new_chunk; 
        
                   prev = new_chunk; 
        
               } 
        
           }

Sentimentron · 2025-09-12T20:16:44Z

I hadn't tried it - wrote some quick code for it but it seems that the unfilter benchmark doesn't test this edge case... 🤔

Sentimentron · 2025-09-12T20:19:21Z

Also, if any contributors have access to some Intel hardware, could they give this portable_simd version a try? (Otherwise I'll cfg-gate it off in a subsequent version to avoid the AMD Epyc 7B13 regression).

Sentimentron · 2025-12-03T20:54:25Z

Rebaselining to rustc/cargo 1.93.0-nightly (2a7c49606 2025-11-25):

CPU	Baseline	Result	Speedup
Arm Cortex A520	434.84 MiB/s	740.59 MiB/s	70.58%
Arm Cortex X4	2052.5 MiB/s	2.3308 MiB/s	13.56%
Apple Silicon M2	2094.4 MiB/s	2167.4 MiB/s	3.51%
Apple Silicon M4 Pro	2771.9 MiB/s	2808.5 MiB/s	1.16% (insignificant)
AMD EPYC 7B13	2716.2 MiB/s	2335.8 MiB/s	-13.98%

Overall, I'd say it's still probably worth it for aarch64 systems, the A520 gain is particularly nice to have for low-end devices.

Again, Cortex-A520 seems the big winner here, going from 434 MiB/s to about 740 MiB/s (70% faster), X4 benefits less (about 13%).

Sentimentron force-pushed the portable_simd-avg-bpp4 branch from ac6b46c to 096960b Compare September 12, 2025 20:17

Sentimentron force-pushed the portable_simd-avg-bpp4 branch from 096960b to d221c8f Compare December 3, 2025 20:55

Sentimentron marked this pull request as ready for review December 9, 2025 19:48

perf: avg filter (4bpp)

94039f0

Again, Cortex-A520 seems the big winner here, going from 434 MiB/s to about 740 MiB/s (70% faster), X4 benefits less (about 13%).

Sentimentron force-pushed the portable_simd-avg-bpp4 branch from d221c8f to 94039f0 Compare March 14, 2026 16:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

portable_simd version of Avg (4bpp)#641

portable_simd version of Avg (4bpp)#641
Sentimentron wants to merge 1 commit intoimage-rs:masterfrom
Sentimentron:portable_simd-avg-bpp4

Sentimentron commented Sep 9, 2025

Uh oh!

Sentimentron commented Sep 9, 2025

Uh oh!

okaneco commented Sep 12, 2025 •

edited

Loading

Uh oh!

Sentimentron commented Sep 12, 2025

Uh oh!

Sentimentron commented Sep 12, 2025

Uh oh!

Sentimentron commented Dec 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Sentimentron commented Sep 9, 2025

Uh oh!

Sentimentron commented Sep 9, 2025

Uh oh!

okaneco commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sentimentron commented Sep 12, 2025

Uh oh!

Sentimentron commented Sep 12, 2025

Uh oh!

Sentimentron commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

okaneco commented Sep 12, 2025 •

edited

Loading

Sentimentron commented Dec 3, 2025 •

edited

Loading