Skip to content

fork_join_executor silently truncates ranges > 2^32 #6922

@arpittkhandelwal

Description

@arpittkhandelwal

Summary

fork_join_executor currently truncates iteration ranges to 32-bit
indices. When size > UINT32_MAX (~4.29B), indices wrap silently,
causing incorrect results with no warning or exception.

This affects both static and dynamic scheduling paths.

Root Cause

  1. Static scheduling:
    In fork_join_executor.hpp, partition boundaries are cast to
    std::uint32_t:

    auto const part_begin = static_cast<std::uint32_t>(...);
    auto const part_end   = static_cast<std::uint32_t>(...);
    

    If size > UINT32_MAX, part_begin/part_end wrap.

  2. Dynamic scheduling:
    contiguous_index_queue is hardcoded to 32-bit indices via:

    static_assert(sizeof(T) <= 4, ...);
    

    It packs two 32-bit indices into a 64-bit atomic, so 64-bit
    ranges are fundamentally unsupported.

Impact

  • Silent incorrect results for large ranges
  • Affects for_each, transform, reduce, etc.
  • Particularly problematic for HPC workloads

Proposed Fix

Immediate (correctness):

  1. For loop_schedule::static_:
    Replace std::uint32_t with std::size_t in partition calculations.

  2. For loop_schedule::dynamic:
    Add runtime guard:

    if (size > std::numeric_limits<std::uint32_t>::max())
        throw std::overflow_error(
            "fork_join_executor: dynamic scheduling "
            "not supported for ranges > 2^32");
    

Long-term:

  • Either implement a 64-bit index queue (128-bit CAS), or
  • Provide a lock-based fallback for 64-bit indices.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions