-
Notifications
You must be signed in to change notification settings - Fork 653
perf: IBA::resample improvements to speed up 20x or more #4993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
For IBA::resample() when bilinear interpolation is used, almost all of the expense was due to its relying on ImageBuf::interppixel which is simple but constructs a new ImageBuf::ConstIterator EVERY TIME, which is very expensive. Reimplement in a way that reuses a single iterator. This speeds up IBA::resample by 20x or more typicaly. Also refactor resample to pull the handling of deep images into a separate helper function and out of the main inner loop. And add some benchmarking. Signed-off-by: Larry Gritz <[email protected]>
|
I have added your fixes to hwy branch and run a comparison. It now looks more fair :) |
|
Ran some tests locally and can also confirm the perf goodness (on boring uint8 pngs) as well as successful Time in
|
Yeah, around 5x speedup is what I'd expect, or at least hope for, for a good scalar vs good AVX2 implementation. Very far from that in either direction tells me that one of them has something wrong with it that is probably easily fixable. |
|
Approval on this? |
|
If no regression in results, then for sure it ready :) |
…wareFoundation#4993) For IBA::resample() when bilinear interpolation is used, almost all of the expense was due to its relying on ImageBuf::interppixel which is simple but constructs a new ImageBuf::ConstIterator EVERY TIME, which is very expensive. Reimplement in a way that reuses a single iterator. This speeds up IBA::resample by 20x or more typicaly. Also refactor resample to pull the handling of deep images into a separate helper function and out of the main inner loop. And add some benchmarking. Signed-off-by: Larry Gritz <[email protected]>
For IBA::resample() when bilinear interpolation is used, almost all of the expense was due to its relying on ImageBuf::interppixel which is simple but constructs a new ImageBuf::ConstIterator EVERY TIME, which is very expensive.
Reimplement in a way that reuses a single iterator. This speeds up IBA::resample by 20x or more typicaly.
Also refactor resample to pull the handling of deep images into a separate helper function and out of the main inner loop. And add some benchmarking.