Implement P2300 bulk adapter for HPX executors#7240
Conversation
|
Can one of the admins verify this patch? |
Up to standards ✅🟢 Issues
|
5cdcfbd to
7b3fc5f
Compare
|
@shivansh023023 Just FYI, before you invest more time into our old sender/receiver implementation: #7123 will soon remove almost all of our code related to this. Please focus your efforts on whatever stays after this has been merged. |
|
Thank you for the heads-up regarding #7123. I appreciate the guidance to avoid spending effort on code that will soon be deprecated. I will pause my work on the current P2300 bulk adapter and follow the progress of #7123 closely. Once that is merged, I’ll re-evaluate how to implement the bulk functionality within the new framework. In the meantime, I’ll focus on finishing the review items for PR #7070 (local convenience header) and cleaning up any remaining issues on my other active PRs that are unaffected by the sender/receiver removal. |
7b3fc5f to
70c4501
Compare
|
Hi @hkaiser Thank you for the feedback. I have refactored the PR to align with the latest stdexec changes: Purged Legacy Code: Removed all !defined(HPX_HAVE_STDEXEC) blocks from executor_scheduler_bulk.hpp. Header Optimization: Removed the inclusions of executor_scheduler.hpp and executor_scheduler_bulk.hpp from the parallel_executor and sequenced_executor headers. Forward Declarations: Implemented forward declarations for the returned scheduler types in the executor headers to prevent bloat. Fixed Includes: Removed the reference to the deprecated get_scheduler.hpp header. The headers are now much cleaner. Let me know if there are any other areas you'd like me to slim down! |
…nder/receiver concepts, NVCC guard, dedup get_scheduler, fix post() UB, use get_scheduler in tests
bbfc7db to
1f4aeee
Compare
|
Hi @hkaiser Thank you for the architectural guidance! I have implemented your suggestions to clean up the header dependencies and tag_invoke logic: Forwarding Header: I created executor_scheduler_fwd.hpp and moved the declarations there, which allowed me to strip the redundant forward declarations from the individual executor headers. Tag-Invoke Overloads: I removed the default template arguments and duplicate overloads in parallel_executor and sequenced_executor, replacing them with the concrete friend functions you suggested. NVCC Fix: I ensured the NVCC/CUDACC guards for the destructors were correctly restored. Standard Test Patterns: I updated the unit tests to use the standard ex::get_scheduler(exec) call instead of manual constructor calls. The structure feels much more consistent with the rest of the HPX core modules now. Thanks! |
|
Hi @charan-003 Thanks for the catch on the stdexec compliance and the safety issues! I’ve updated the implementation as follows: Sender/Receiver Concepts: I added sender_concept to executor_sender and receiver_concept to executor_bulk_receiver. These are now correctly picked up by the stdexec concept checks. UB Fix in post: I refactored the lambda capture in executor_operation_state::start(). It now captures the state by reference and only moves the receiver inside the task body. This ensures that if hpx::parallel::execution::post throws, the receiver is still valid for the set_error call in the catch block. I really appreciate the eye for detail on that potential move-before-post UB,definitely a safer implementation now! |
d0b7f07 to
7ab5571
Compare
|
@hkaiser C++20 Concepts: I’ve replaced the legacy std::enable_if_t in the executor_scheduler constructor with a clean C++20 requires clause and included the header. Purged Redundant Guards: Removed the #if defined(HPX_HAVE_STDEXEC) guards in executor_scheduler_bulk.hpp, as we now rely strictly on the modern Sender/Receiver path. Header Optimization: I have completely removed the implementation headers (executor_scheduler.hpp and executor_scheduler_bulk.hpp) from parallel_executor.hpp and sequenced_executor.hpp. These files now rely solely on the lightweight executor_scheduler_fwd.hpp header, significantly reducing the include bloat for the core executors. The code is now much cleaner and follows the forwarding pattern we discussed. It should be ready for a final review! |
|
@shivansh023023 Thanks for the updates. I'd like to wait for #7257 to be finalized before merging anything else related to s&r. Please be patient as this may take a moment. |
Sure sir |
|
Hey @hkaiser , all the CI checks are passing and I've implemented all your feedback |
|
@charan-003 |
My Discord ID: charan_003 |
i've sent you a request |
d357952 to
8547843
Compare
|
@hkaiser I gutted the old tag_invoke logic and migrated everything to member functions per the latest P2300 spec. I also cleaned up the concept constraints and made sure the lambda captures are safe. |
There are still a couple of completely unrelated changes that should be removed. |
can you tell me what are those changes , so that i can remove them |
Please see my comments above (e.g., #7240 (comment) and others) |
8547843 to
645ef48
Compare
|
@hkaiser can you review it sir ? |
|
All the unrelated changes that I mentioned are still part of this PR (related to the |
Up to standards ✅🟢 Issues
|
|
Now you have suddenly touched on over 380 files! I'd suggest reverting the most recent commit and remove the unrelated changes manually (i.e., not involving the help of any LLM). |
2fa00bc to
22a10f4
Compare
|
@shivansh023023 Could you please rebase this onto master to resolve the conflicts? |
- Add executor_scheduler_bulk.hpp with bulk sender/receiver for executor-based schedulers - Add executor_algorithm_bulk.cpp unit test covering sequenced and parallel bulk execution - Register new header and test in CMakeLists Signed-off-by: Shivansh Singh <your_github_email@example.com>
c462df1 to
e172e8f
Compare
done |
hkaiser
left a comment
There was a problem hiding this comment.
LGTM, thanks! Do you plan to support bulk_chunked and bulk_unchunked as well?
|
@hkaiser Yes, I plan to support bulk_chunked and bulk_unchunked as well. this PR establishes the core P2300 bulk adapter infrastructure, I thought it was best to keep this diff focused and manageable. will open a separate follow-up PR shortly to implement the chunked and unchunked variations, building directly on top of this foundation. |
| auto tag_invoke(bulk_t, executor_scheduler<Executor> const& sched, | ||
| Sender&& sender, Shape const& shape, F&& f) | ||
| { | ||
| return detail::executor_bulk_sender<Executor, std::decay_t<Sender>, | ||
| Shape, std::decay_t<F>>{ | ||
| sched.exec_, HPX_FORWARD(Sender, sender), shape, HPX_FORWARD(F, f)}; | ||
| } |
There was a problem hiding this comment.
Suggestion: Replace tag_invoke with a member function. tag_invoke is deprecated.
|
@shivansh023023 I think that this PR should be the next one to be worked on and finalized. Could you please have a look at @charan-003's comment (#7240 (comment))? |
…r_scheduler Refactor the P2300 bulk customization point from a deprecated tag_invoke free function to a member function on executor_scheduler, matching the modern P2300 sender/receiver specification. - Add bulk() member declaration in executor_scheduler (executor_scheduler.hpp) - Define bulk() out-of-line in executor_scheduler_bulk.hpp after executor_bulk_sender is fully defined - Remove the tag_invoke(bulk_t, executor_scheduler, ...) free function Signed-off-by: Shivansh Singh <your_github_email@example.com>
|
LGTM |
Done |
P2300
bulkIntegration for HPX ExecutorsDescription
This PR implements the P2300
bulksender adapter specifically for HPX's legacy executors, including theparallel_executorandsequenced_executor. By bridging thehpx::execution::experimental::bulkalgorithm with HPX's internalbulk_sync_executemechanism, this change ensures that data-parallel workloads are properly load-balanced and optimized using HPX's high-performance partitioners.Key Technical Improvements:
bulksender to the underlying HPX execution engine for native parallel performance.tag_invokeOverloads: Implemented a full suite ofconnect_toverloads (supporting&&,&, andconst&) to ensure compatibility with consumer algorithms likesync_wait.std::decay_twithin thebulk_receiverto prevent reference collapsing and ensure the safety of deferred functional object execution.executorsmodule rather than the coreexecutionmodule.Proposed Changes
libs/core/executors/include/hpx/executors/executor_scheduler_bulk.hpp: Implementsexecutor_bulk_senderandexecutor_bulk_receiver.libs/core/executors/include/hpx/executors/executor_scheduler.hpp: Exposesget_completion_scheduler_tfor ADL discovery.libs/core/executors/include/hpx/executors/parallel_executor.hpp&sequenced_executor.hpp: Integrated the new bulk headers.libs/core/executors/tests/unit/executor_algorithm_bulk.cpp: Comprehensive validation for sequential and parallel policies.Background context
This work is part of the ongoing effort to modernize HPX's execution model to align with the C++23 Sender/Receiver (P2300) standard. Implementing
bulkis a "Big Impact" milestone because it allows modern asynchronous pipelines to tap into the mature, multi-threaded performance of the HPX runtime.Checklist
executor_algorithm_bulk_testtarget.inspecttool.