Add OpenMP compile/link flags to setup.py for source builds#9456
Open
developer0hye wants to merge 1 commit intopytorch:mainfrom
Open
Add OpenMP compile/link flags to setup.py for source builds#9456developer0hye wants to merge 1 commit intopytorch:mainfrom
developer0hye wants to merge 1 commit intopytorch:mainfrom
Conversation
Source builds of torchvision do not pass -fopenmp (compile) or -lomp/-lgomp (link) flags when building the _C extension. Since at::parallel_for is a header-only template whose #pragma omp directives are compiled into the calling translation unit (_C.so), the missing flags cause it to silently fall back to sequential execution. This has had no observable effect so far because no existing torchvision C++ kernel directly uses at::parallel_for or #pragma omp. However, upcoming changes (e.g. pytorch#9442) introduce at::parallel_for, and without these flags source builds get 0% speedup from parallelization. - macOS: -Xpreprocessor -fopenmp (compile) + -lomp from PyTorch's bundled libomp (link) - Linux: -fopenmp (compile) + -lgomp (link) - Windows: unchanged (uses /openmp via MSVC, already handled separately) Fixes pytorch#2783 Signed-off-by: Yonghye Kwon <[email protected]>
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9456
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
-fopenmpcompile flag and-lomp/-lgomplink flag tosetup.pyso thatat::parallel_for(and other OpenMP constructs) in torchvision's C++ extensions actually parallelize when built from source.-Xpreprocessor -fopenmp+-lomp(from PyTorch's bundledlibomp)-fopenmp+-lgompMotivation
at::parallel_foris a header-only template (ATen/Parallel.h) — its#pragma omp paralleldirectives are compiled into the calling translation unit (_C.so), not intolibtorch_cpu. Without-fopenmpat compile time, the compiler silently ignores the pragma andat::parallel_forfalls back to sequential execution.No existing torchvision C++ kernel currently calls
at::parallel_fordirectly, so this has had no observable effect. However:at::parallel_fortodeform_conv2dCPU forward — source builds get 0% speedup without this fixwarning: ignoring #pragma omp parallelduring source builds — this is the root causeroi_alignOpenMP parallelization — blocked by the same missing flagsPre-built pip/conda wheels are unaffected (CI build scripts handle OpenMP separately).
Verification
Before (current
setup.py):Thread scaling with
at::parallel_forin deform_conv2d (#9442):After (this PR):
Test plan
otool -L torchvision/_C.so | grep ompshowslibompon macOS source buildat::parallel_forscales with thread count (using Optimize CPU deform_conv2d forward pass with parallel im2col #9442 benchmark)python -m pytest test/test_ops.py -v-lgompwithout errorsFixes #2783
Related: #9442, #4935, #6619, #9455
cc @NicolasHug
🤖 Generated with Claude Code