Skip to content

[PR4] Ray-benchmark tool to test throughput of batch query API added in #195#194

Open
Waqar-ukaea wants to merge 1 commit intoxdg-org:mainfrom
Waqar-ukaea:pr4-batch-query-benchmark-tool
Open

[PR4] Ray-benchmark tool to test throughput of batch query API added in #195#194
Waqar-ukaea wants to merge 1 commit intoxdg-org:mainfrom
Waqar-ukaea:pr4-batch-query-benchmark-tool

Conversation

@Waqar-ukaea
Copy link
Copy Markdown
Collaborator

This PR adds a new tool/miniapp that I have been using to benchmark the batch query API pure ray throughput on various GPUs.

This PR adds tools/ray_benchmark/ which includes:

  • ray_benchmark.cpp - The main application
  • ray_benchmark.h - Header file containing some helper functions including definition of the callback method which "mocks" an external application filling XDG's internal ray buffers
  • ray_benchmark_deviceCode.slang - Slang compute shader for filling XDG's internal ray buffer on device
  • ray_benchmark_shared.h - Shared code between host and device
  • ray_benchmark_driver.py - A python script used to drive multiple runs of the ray_benchmark miniapp for more consistent results measurement.

Copied from #178:

Benchmark parameters

model Volume No. of Elements No. of Rays Location No. of Runs
simple_tokamak 2 280K 50M (180,250,-27) 100

A render of the simple_tokamak model [1] used in these preliminary benchmarks along with a depiction (significantly smaller number of rays plotted) of the rays launched is shown in the image below, along with the volume queried against highlghted in blue:
ray_benchmark_tokamak_setup

Ray tracer performance (trace-only)

Baseline = Embree (CPU), 2× Intel® Xeon® Platinum 8480+ (Sapphire Rapids) × 112 threads

Times and throughput averaged over the 100 runs.

Ray Tracer backend Hardware (Threads / Device) Trace Time (s) Throughput (ray/s) Speedup vs 2×8480+ (112-thread) Peak FP32/FP64 (TFLOPS) + RT cores Node %
Embree 13th Gen Intel® Core™ i7-13850HX × 28 threads 0.749128 1.06791e+08 ~0.25× N/A Not HPC
Embree 1× Intel® Xeon® Platinum 8480+ (Sapphire Rapids) × 56 threads 0.367881 2.17462e+08 ~0.51× N/A 50%
Embree 2× Intel® Xeon® Platinum 8480+ (Sapphire Rapids) × 112 threads 0.18914 4.22967e+08 1× (baseline) N/A 100%
GPRT (FP64) NVIDIA RTX 2000 Ada 1.11261 7.19033e+07 ~0.17× (baseline faster) FP32: 12.0
FP64: 0.19
RT cores: 22
Not HPC
GPRT (FP32 + RT cores) NVIDIA RTX 2000 Ada 0.0302956 2.64065e+09 ~6.24× FP32: 12.0
FP64: 0.19
RT cores: 22
Not HPC
GPRT (FP64) NVIDIA L40 0.185930 4.319e+08 ~1.02× FP32: 90.5
FP64: 1.41
RT cores: 142
25%
GPRT (FP32 + RT cores) NVIDIA L40 0.008051 9.954e+09 ~23.5× FP32: 90.5
FP64: 1.41
RT cores: 142
25%
GPRT (FP64) NVIDIA A100 0.183721 4.506e+08 ~1.07× FP32: 19.5
FP64: 9.7
RT cores: N/A
25%
GPRT (FP64) AMD MI300X - No VK ray tracing support N/A N/A N/A N/A N/A
GPRT (FP64) Intel Data Center GPU Max 1100 (Ponte Vecchio) - Vulkan doesn’t recognise PVCs as physical devices N/A N/A N/A N/A N/A

Next Steps - It's probably worth coming up with a more computationally intense benchmark problem. For this simple ray-throughput case, it might just have to be increasing the number of rays - however I am memory bound for the RTX 2000 Ada which only has 8GB, so I'll have to have a think what could be more suitable.

Performance seems to be capping out at ~4e+08 rays/sec no matter how much more theoretical performance the card has. Increasing the number of rays fired up from 80M seems to have no positive impact on this performance metric.

References

[1] Valentine, A., Berry, T., Bradnam, S., Hagues, J., & Hodson, J. (2022). Benchmarking of emergent radiation transport codes for fusion neutronics applications. Fusion Engineering and Design, 180, 113197. https://doi.org/10.1016/j.fusengdes.2022.113197

@Waqar-ukaea Waqar-ukaea changed the title Ray-benchmark tool to test throughput of batch query API added in #191 (PR 4) [PR4] Ray-benchmark tool to test throughput of batch query API added in #191 Feb 4, 2026
@Waqar-ukaea Waqar-ukaea changed the title [PR4] Ray-benchmark tool to test throughput of batch query API added in #191 [PR4] Ray-benchmark tool to test throughput of batch query API added in #194 Feb 4, 2026
@Waqar-ukaea Waqar-ukaea changed the title [PR4] Ray-benchmark tool to test throughput of batch query API added in #194 [PR4] Ray-benchmark tool to test throughput of batch query API added in #195 Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant