-
-
Notifications
You must be signed in to change notification settings - Fork 539
Closed
Description
Summary
fork_join_executor currently truncates iteration ranges to 32-bit
indices. When size > UINT32_MAX (~4.29B), indices wrap silently,
causing incorrect results with no warning or exception.
This affects both static and dynamic scheduling paths.
Root Cause
-
Static scheduling:
In fork_join_executor.hpp, partition boundaries are cast to
std::uint32_t:auto const part_begin = static_cast<std::uint32_t>(...); auto const part_end = static_cast<std::uint32_t>(...);If size > UINT32_MAX, part_begin/part_end wrap.
-
Dynamic scheduling:
contiguous_index_queue is hardcoded to 32-bit indices via:static_assert(sizeof(T) <= 4, ...);It packs two 32-bit indices into a 64-bit atomic, so 64-bit
ranges are fundamentally unsupported.
Impact
- Silent incorrect results for large ranges
- Affects for_each, transform, reduce, etc.
- Particularly problematic for HPC workloads
Proposed Fix
Immediate (correctness):
-
For loop_schedule::static_:
Replace std::uint32_t with std::size_t in partition calculations. -
For loop_schedule::dynamic:
Add runtime guard:if (size > std::numeric_limits<std::uint32_t>::max()) throw std::overflow_error( "fork_join_executor: dynamic scheduling " "not supported for ranges > 2^32");
Long-term:
- Either implement a 64-bit index queue (128-bit CAS), or
- Provide a lock-based fallback for 64-bit indices.
Reactions are currently unavailable