Skip to content

Commit 2705b67

Browse files
authored
[UR][L0v2] Fix sync bug in enqueueEventsWaitWithBarrier (#21251)
`ur_queue_immediate_out_of_order_t::enqueueEventsWaitWithBarrier` has a copy-paste bug where it waits for barrier events `N` times on the first (internal) command list, instead of waiting on the `N` command lists once each. This is likely a copy-paste error from the preceding call to `barrierFn`, that was not caught in testing or code review. The bug does not seem to reproduce on any released GPUs on Linux, it looks as-if waiting for any event on a single command-list blocks dispatch from every other command-list on all our current GPUs. However I did not investigate this deeply, because I believe this is a clear error on UR's side either way. The bug IS reproducible on an Intel internal simulator; this is how I caught it. I can provide more details on internal channels if desired. For reference below is the reproducer used. Tested on BMG and Panther Lake, where it passes both before and after the PR, and with the simulator where it fails before, but is fixed by this change. The reproducer also passes with the level zero V1 adapter on the simulated device. <details> <summary>Reproducer</summary> ```cpp int main(int argc, char *argv[]) { sycl::queue q; // Out of order! int tripCount = 200'000'000; if (argc > 1) tripCount = std::atoi(argv[1]); int *a = sycl::malloc_shared<int>(1, q); int *b = sycl::malloc_shared<int>(1, q); q.single_task([=] { float sum = 0; for (int i = 0; i < tripCount; ++i) sum += sycl::sqrt(float(i)); *a = (sum > 0); }); q.ext_oneapi_submit_barrier(); q.single_task([=] { *b = *a + 1; }); q.wait(); std::cout << "a: " << *a << ", b: " << *b << std::endl; if (*a != 1 || *b != 2) { std::cout << "Test failed!" << std::endl; return 1; } std::cout << "Test passed!" << std::endl; } ``` </details> I am unsure how a reasonable test might be written to cover this, please advise if that's desired. Fixes: #20861
1 parent ebf9572 commit 2705b67

File tree

2 files changed

+1
-4
lines changed

2 files changed

+1
-4
lines changed

sycl/test-e2e/Regression/barrier_with_work.cpp

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,6 @@
99
// UNSUPPORTED: linux && (gpu-intel-dg2 || arch-intel_gpu_bmg_g21 || arch-intel_gpu_pvc)
1010
// UNSUPPORTED-TRACKER: https://github.com/intel/llvm/issues/20600
1111
//
12-
// XFAIL: windows && arch-intel_gpu_bmg_g21
13-
// XFAIL-TRACKER: https://github.com/intel/llvm/issues/20861
14-
//
1512
// Tests that barriers block all following execution on queues with active work.
1613
// For L0 we currently need to set
1714
// SYCL_PI_LEVEL_ZERO_USE_MULTIPLE_COMMANDLIST_BARRIERS to enable fix on certain

unified-runtime/source/adapters/level_zero/v2/queue_immediate_out_of_order.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -187,7 +187,7 @@ ur_result_t ur_queue_immediate_out_of_order_t::enqueueEventsWaitWithBarrier(
187187
}
188188

189189
for (size_t id = phEvent ? 1 : 0; id < numCommandLists; id++) {
190-
UR_CALL(std::invoke(barrierFn, commandListManagersLocked[0],
190+
UR_CALL(std::invoke(barrierFn, commandListManagersLocked[id],
191191
barrierEventsWaitList, nullptr));
192192
}
193193

0 commit comments

Comments
 (0)