Add OpenMP compile/link flags to setup.py for source builds by developer0hye · Pull Request #9456 · pytorch/vision

developer0hye · 2026-03-27T23:52:05Z

Summary

Add -fopenmp compile flag and -lomp/-lgomp link flag to setup.py so that at::parallel_for (and other OpenMP constructs) in torchvision's C++ extensions actually parallelize when built from source.
macOS: -Xpreprocessor -fopenmp + -lomp (from PyTorch's bundled libomp)
Linux: -fopenmp + -lgomp
Windows: unchanged (MSVC handles OpenMP separately)

Motivation

at::parallel_for is a header-only template (ATen/Parallel.h) — its #pragma omp parallel directives are compiled into the calling translation unit (_C.so), not into libtorch_cpu. Without -fopenmp at compile time, the compiler silently ignores the pragma and at::parallel_for falls back to sequential execution.

No existing torchvision C++ kernel currently calls at::parallel_for directly, so this has had no observable effect. However:

Optimize CPU deform_conv2d forward pass with parallel im2col #9442 introduces at::parallel_for to deform_conv2d CPU forward — source builds get 0% speedup without this fix
warning: ignoring #pragma omp parallel #2783 (open since 2020) reports warning: ignoring #pragma omp parallel during source builds — this is the root cause
torchvision.roi_align performance optimization with openMP #4935 proposes roi_align OpenMP parallelization — blocked by the same missing flags
[RFC] torchvision performance optimization on CPU #6619 (RFC) proposes broader CPU kernel parallelization — all future work depends on this

Pre-built pip/conda wheels are unaffected (CI build scripts handle OpenMP separately).

Verification

Note: All verification below was performed on macOS ARM only (Apple M2 Air, macOS 26.3.1, Python 3.12, PyTorch 2.10.0). The Linux (-lgomp) path has not been locally tested and needs CI or a separate Linux verification.

Before (current setup.py):

$ otool -L torchvision/_C.so | grep omp
(nothing)

Thread scaling with at::parallel_for in deform_conv2d (#9442):

Threads=1: 2.99ms    Threads=4: 2.65ms   ← no scaling

After (this PR):

$ otool -L torchvision/_C.so | grep omp
  .../libomp.dylib

Threads=1: 2.91ms    Threads=4: 1.07ms   ← 2.7× scaling

Test plan

Verify otool -L torchvision/_C.so | grep omp shows libomp on macOS source build
Verify at::parallel_for scales with thread count (using Optimize CPU deform_conv2d forward pass with parallel im2col #9442 benchmark)
Verify existing tests pass on macOS ARM: python -m pytest test/test_ops.py -v
Needs verification: Linux source build links -lgomp without errors
Needs verification: Pre-built wheels are unaffected (no behavior change for pip installs)

Fixes #2783
Related: #9442, #4935, #6619, #9455

cc @NicolasHug

🤖 Generated with Claude Code

Source builds of torchvision do not pass -fopenmp (compile) or -lomp/-lgomp (link) flags when building the _C extension. Since at::parallel_for is a header-only template whose #pragma omp directives are compiled into the calling translation unit (_C.so), the missing flags cause it to silently fall back to sequential execution. This has had no observable effect so far because no existing torchvision C++ kernel directly uses at::parallel_for or #pragma omp. However, upcoming changes (e.g. pytorch#9442) introduce at::parallel_for, and without these flags source builds get 0% speedup from parallelization. - macOS: -Xpreprocessor -fopenmp (compile) + -lomp from PyTorch's bundled libomp (link) - Linux: -fopenmp (compile) + -lgomp (link) - Windows: unchanged (uses /openmp via MSVC, already handled separately) Fixes pytorch#2783 Signed-off-by: Yonghye Kwon <[email protected]>

pytorch-bot · 2026-03-27T23:52:10Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9456

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-cla bot added the cla signed label Mar 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenMP compile/link flags to setup.py for source builds#9456

Add OpenMP compile/link flags to setup.py for source builds#9456
developer0hye wants to merge 1 commit intopytorch:mainfrom
developer0hye:fix/setup-openmp-flags

developer0hye commented Mar 27, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

developer0hye commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Verification

Test plan

Uh oh!

pytorch-bot bot commented Mar 27, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9456

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

developer0hye commented Mar 27, 2026 •

edited

Loading