Skip to content

Add OpenMP compile/link flags to setup.py for source builds#9456

Open
developer0hye wants to merge 1 commit intopytorch:mainfrom
developer0hye:fix/setup-openmp-flags
Open

Add OpenMP compile/link flags to setup.py for source builds#9456
developer0hye wants to merge 1 commit intopytorch:mainfrom
developer0hye:fix/setup-openmp-flags

Conversation

@developer0hye
Copy link
Copy Markdown
Contributor

@developer0hye developer0hye commented Mar 27, 2026

Summary

  • Add -fopenmp compile flag and -lomp/-lgomp link flag to setup.py so that at::parallel_for (and other OpenMP constructs) in torchvision's C++ extensions actually parallelize when built from source.
  • macOS: -Xpreprocessor -fopenmp + -lomp (from PyTorch's bundled libomp)
  • Linux: -fopenmp + -lgomp
  • Windows: unchanged (MSVC handles OpenMP separately)

Motivation

at::parallel_for is a header-only template (ATen/Parallel.h) — its #pragma omp parallel directives are compiled into the calling translation unit (_C.so), not into libtorch_cpu. Without -fopenmp at compile time, the compiler silently ignores the pragma and at::parallel_for falls back to sequential execution.

No existing torchvision C++ kernel currently calls at::parallel_for directly, so this has had no observable effect. However:

  1. Optimize CPU deform_conv2d forward pass with parallel im2col #9442 introduces at::parallel_for to deform_conv2d CPU forward — source builds get 0% speedup without this fix
  2. warning: ignoring #pragma omp parallel #2783 (open since 2020) reports warning: ignoring #pragma omp parallel during source builds — this is the root cause
  3. torchvision.roi_align performance optimization with openMP  #4935 proposes roi_align OpenMP parallelization — blocked by the same missing flags
  4. [RFC] torchvision performance optimization on CPU #6619 (RFC) proposes broader CPU kernel parallelization — all future work depends on this

Pre-built pip/conda wheels are unaffected (CI build scripts handle OpenMP separately).

Verification

Note: All verification below was performed on macOS ARM only (Apple M2 Air, macOS 26.3.1, Python 3.12, PyTorch 2.10.0). The Linux (-lgomp) path has not been locally tested and needs CI or a separate Linux verification.

Before (current setup.py):

$ otool -L torchvision/_C.so | grep omp
(nothing)

Thread scaling with at::parallel_for in deform_conv2d (#9442):

Threads=1: 2.99ms    Threads=4: 2.65ms   ← no scaling

After (this PR):

$ otool -L torchvision/_C.so | grep omp
  .../libomp.dylib
Threads=1: 2.91ms    Threads=4: 1.07ms   ← 2.7× scaling

Test plan

  • Verify otool -L torchvision/_C.so | grep omp shows libomp on macOS source build
  • Verify at::parallel_for scales with thread count (using Optimize CPU deform_conv2d forward pass with parallel im2col #9442 benchmark)
  • Verify existing tests pass on macOS ARM: python -m pytest test/test_ops.py -v
  • Needs verification: Linux source build links -lgomp without errors
  • Needs verification: Pre-built wheels are unaffected (no behavior change for pip installs)

Fixes #2783
Related: #9442, #4935, #6619, #9455

cc @NicolasHug

🤖 Generated with Claude Code

Source builds of torchvision do not pass -fopenmp (compile) or
-lomp/-lgomp (link) flags when building the _C extension. Since
at::parallel_for is a header-only template whose #pragma omp directives
are compiled into the calling translation unit (_C.so), the missing
flags cause it to silently fall back to sequential execution.

This has had no observable effect so far because no existing torchvision
C++ kernel directly uses at::parallel_for or #pragma omp. However,
upcoming changes (e.g. pytorch#9442) introduce at::parallel_for, and without
these flags source builds get 0% speedup from parallelization.

- macOS: -Xpreprocessor -fopenmp (compile) + -lomp from PyTorch's
  bundled libomp (link)
- Linux: -fopenmp (compile) + -lgomp (link)
- Windows: unchanged (uses /openmp via MSVC, already handled separately)

Fixes pytorch#2783

Signed-off-by: Yonghye Kwon <[email protected]>
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 27, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9456

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the cla signed label Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

warning: ignoring #pragma omp parallel

1 participant