Skip to content

Commit cb4a14f

Browse files
amd-hsongSong
authored andcommitted
[rocm-libraries] ROCm/rocm-libraries#5156 (commit 195bdc2)
[rocrand] Fix benchmark_rocrand_device_api launch parameters (#5156) ## Motivation <!-- Explain the purpose of this PR and the goals it aims to achieve. --> When running `benchmark_rocrand_device_api` on certain gpu architectures, we may get a `hipErrorLaunchFailure` due to launch params being larger than launch bounds. This PR fixes this issue. ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> In the code that determines an optimal block size for launching the benchmark kernels, the kernel's `maxThreadsPerBlock` attribute was not honored. This caused the determined number of threads to exceed `maxThreadsPerBlock` on certain architectures. ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> Build and run `benchmark_rocrand_device_api` on gfx1200 to confirm that the launch params are now within launch bounds. ## Test Result <!-- Briefly summarize test outcomes. --> The test passes. ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. Co-authored-by: Song <hsong@ctr2-alola-ctrl-01.amd.com>
1 parent f0c83d6 commit cb4a14f

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

benchmark/benchmark_occupancy_helper.hpp

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -90,10 +90,10 @@ inline launch_params get_benchmark_launch_parameters(T kernel,
9090
else
9191
{
9292
// Heuristic that picks thread count that maximizes occupancy
93-
const std::vector<int> thread_options = {32, 64, 128, 256, 512, 1024};
94-
for(int t : thread_options)
93+
hipFuncAttributes attr;
94+
HIP_CHECK(hipFuncGetAttributes(&attr, (const void*)kernel));
95+
for(int t = 32; t <= attr.maxThreadsPerBlock; t *= 2)
9596
{
96-
9797
if(t > params.max_threads_per_block)
9898
continue;
9999

0 commit comments

Comments
 (0)