perf: IBA::resample improvements to speed up 20x or more #4993

lgritz · 2026-01-06T06:56:34Z

For IBA::resample() when bilinear interpolation is used, almost all of the expense was due to its relying on ImageBuf::interppixel which is simple but constructs a new ImageBuf::ConstIterator EVERY TIME, which is very expensive.

Reimplement in a way that reuses a single iterator. This speeds up IBA::resample by 20x or more typicaly.

Also refactor resample to pull the handling of deep images into a separate helper function and out of the main inner loop. And add some benchmarking.

For IBA::resample() when bilinear interpolation is used, almost all of the expense was due to its relying on ImageBuf::interppixel which is simple but constructs a new ImageBuf::ConstIterator EVERY TIME, which is very expensive. Reimplement in a way that reuses a single iterator. This speeds up IBA::resample by 20x or more typicaly. Also refactor resample to pull the handling of deep images into a separate helper function and out of the main inner loop. And add some benchmarking. Signed-off-by: Larry Gritz <[email protected]>

ssh4net · 2026-01-07T04:47:46Z

I have added your fixes to hwy branch and run a comparison. It now looks more fair :)

[ Resample 75% ]
uint8      |       9.43 |       1.67 |   5.65x
uint16     |       9.17 |       1.81 |   5.07x
uint32     |      26.92 |      19.57 |   1.38x
float      |       9.55 |       1.87 |   5.11x
half       |      11.30 |       4.69 |   2.41x
double     |      32.98 |      30.87 |   1.07x

[ Resample 50% ]
uint8      |       4.20 |       0.79 |   5.31x
uint16     |       4.19 |       0.89 |   4.70x
uint32     |      17.40 |      17.01 |   1.02x
float      |       4.33 |       0.79 |   5.48x
half       |       5.36 |       2.25 |   2.38x
double     |      20.27 |      18.89 |   1.07x

[ Resample 25% ]
uint8      |       1.24 |       0.24 |   5.05x
uint16     |       1.19 |       0.26 |   4.52x
uint32     |      11.98 |      11.16 |   1.07x
float      |       1.08 |       0.20 |   5.39x
half       |       1.29 |       0.55 |   2.37x
double     |      16.50 |      13.57 |   1.22x
Type       | Scalar(ms) |   SIMD(ms) | Speedup

jessey-git · 2026-01-07T05:18:16Z

Ran some tests locally and can also confirm the perf goodness (on boring uint8 pngs) as well as successful idiff comparisons of main vs the PR.

Time in ms average over 5 attempts:

main	PR
`4096x4096, 25% --> 1024x1024 : Avg 282.5024`	`4096x4096, 25% --> 1024x1024 : Avg 16.9484`
`4096x4096, 33% --> 1351x1351 : Avg 484.7668`	`4096x4096, 33% --> 1351x1351 : Avg 29.4051`
`4096x4096, 50% --> 2048x2048 : Avg 1126.7262`	`4096x4096, 50% --> 2048x2048 : Avg 61.9360`
`4096x4096, 67% --> 2744x2744 : Avg 2021.3136`	`4096x4096, 67% --> 2744x2744 : Avg 96.1518`
`4096x4096, 75% --> 3072x3072 : Avg 2529.3684`	`4096x4096, 75% --> 3072x3072 : Avg 124.1163`

`3840x2160, 25% --> 960x540 : Avg 139.5478`	`3840x2160, 25% --> 960x540 : Avg 7.0885`
`3840x2160, 33% --> 1267x712 : Avg 243.2302`	`3840x2160, 33% --> 1267x712 : Avg 12.2748`
`3840x2160, 50% --> 1920x1080 : Avg 558.0922`	`3840x2160, 50% --> 1920x1080 : Avg 25.5491`
`3840x2160, 67% --> 2572x1447 : Avg 996.6915`	`3840x2160, 67% --> 2572x1447 : Avg 47.6503`
`3840x2160, 75% --> 2880x1620 : Avg 1247.9340`	`3840x2160, 75% --> 2880x1620 : Avg 62.8001`

`2160x3840, 25% --> 540x960 : Avg 139.3882`	`2160x3840, 25% --> 540x960 : Avg 8.7678`
`2160x3840, 33% --> 712x1267 : Avg 244.5914`	`2160x3840, 33% --> 712x1267 : Avg 16.3722`
`2160x3840, 50% --> 1080x1920 : Avg 557.2218`	`2160x3840, 50% --> 1080x1920 : Avg 25.6951`
`2160x3840, 67% --> 1447x2572 : Avg 994.1367`	`2160x3840, 67% --> 1447x2572 : Avg 45.7704`
`2160x3840, 75% --> 1620x2880 : Avg 1246.6437`	`2160x3840, 75% --> 1620x2880 : Avg 60.2098`

`116x2052, 25% --> 29x513 : Avg 4.1224`	`116x2052, 25% --> 29x513 : Avg 0.3348`
`116x2052, 33% --> 38x677 : Avg 7.1067`	`116x2052, 33% --> 38x677 : Avg 0.4689`
`116x2052, 50% --> 58x1026 : Avg 16.1875`	`116x2052, 50% --> 58x1026 : Avg 0.8663`
`116x2052, 67% --> 77x1374 : Avg 28.1018`	`116x2052, 67% --> 77x1374 : Avg 1.4981`
`116x2052, 75% --> 87x1539 : Avg 36.0132`	`116x2052, 75% --> 87x1539 : Avg 2.4759`

lgritz · 2026-01-07T07:46:06Z

I have added your fixes to hwy branch and run a comparison. It now looks more fair :)

Yeah, around 5x speedup is what I'd expect, or at least hope for, for a good scalar vs good AVX2 implementation. Very far from that in either direction tells me that one of them has something wrong with it that is probably easily fixable.

lgritz · 2026-01-08T03:29:56Z

Approval on this?

ssh4net · 2026-01-08T03:32:00Z

If no regression in results, then for sure it ready :)

…wareFoundation#4993) For IBA::resample() when bilinear interpolation is used, almost all of the expense was due to its relying on ImageBuf::interppixel which is simple but constructs a new ImageBuf::ConstIterator EVERY TIME, which is very expensive. Reimplement in a way that reuses a single iterator. This speeds up IBA::resample by 20x or more typicaly. Also refactor resample to pull the handling of deep images into a separate helper function and out of the main inner loop. And add some benchmarking. Signed-off-by: Larry Gritz <[email protected]>

lgritz mentioned this pull request Jan 7, 2026

Add Google Highway SIMD acceleration for ImageBufAlgo operations #4986

Open

6 tasks

jessey-git approved these changes Jan 8, 2026

View reviewed changes

lgritz merged commit 774368d into AcademySoftwareFoundation:main Jan 8, 2026
29 checks passed

lgritz deleted the lg-resamplespeed branch January 8, 2026 19:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: IBA::resample improvements to speed up 20x or more #4993

perf: IBA::resample improvements to speed up 20x or more #4993

Uh oh!

lgritz commented Jan 6, 2026

Uh oh!

ssh4net commented Jan 7, 2026

Uh oh!

jessey-git commented Jan 7, 2026

Uh oh!

lgritz commented Jan 7, 2026

Uh oh!

lgritz commented Jan 8, 2026

Uh oh!

ssh4net commented Jan 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf: IBA::resample improvements to speed up 20x or more #4993

perf: IBA::resample improvements to speed up 20x or more #4993

Uh oh!

Conversation

lgritz commented Jan 6, 2026

Uh oh!

ssh4net commented Jan 7, 2026

Uh oh!

jessey-git commented Jan 7, 2026

Uh oh!

lgritz commented Jan 7, 2026

Uh oh!

lgritz commented Jan 8, 2026

Uh oh!

ssh4net commented Jan 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants