[PR4] Ray-benchmark tool to test throughput of batch query API added in #195#194
Open
Waqar-ukaea wants to merge 1 commit intoxdg-org:mainfrom
Open
[PR4] Ray-benchmark tool to test throughput of batch query API added in #195#194Waqar-ukaea wants to merge 1 commit intoxdg-org:mainfrom
Waqar-ukaea wants to merge 1 commit intoxdg-org:mainfrom
Conversation
This was referenced Jan 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a new tool/miniapp that I have been using to benchmark the batch query API pure ray throughput on various GPUs.
This PR adds
tools/ray_benchmark/which includes:ray_benchmark.cpp- The main applicationray_benchmark.h- Header file containing some helper functions including definition of the callback method which "mocks" an external application filling XDG's internal ray buffersray_benchmark_deviceCode.slang- Slang compute shader for filling XDG's internal ray buffer on deviceray_benchmark_shared.h- Shared code between host and deviceray_benchmark_driver.py- A python script used to drive multiple runs of theray_benchmarkminiapp for more consistent results measurement.Copied from #178:
Benchmark parameters
A render of the simple_tokamak model [1] used in these preliminary benchmarks along with a depiction (significantly smaller number of rays plotted) of the rays launched is shown in the image below, along with the volume queried against highlghted in blue:

Ray tracer performance (trace-only)
Baseline = Embree (CPU), 2× Intel® Xeon® Platinum 8480+ (Sapphire Rapids) × 112 threads
Times and throughput averaged over the 100 runs.
FP64: 0.19
RT cores: 22
FP64: 0.19
RT cores: 22
FP64: 1.41
RT cores: 142
FP64: 1.41
RT cores: 142
FP64: 9.7
RT cores: N/A
Next Steps - It's probably worth coming up with a more computationally intense benchmark problem. For this simple ray-throughput case, it might just have to be increasing the number of rays - however I am memory bound for the RTX 2000 Ada which only has 8GB, so I'll have to have a think what could be more suitable.
Performance seems to be capping out at ~4e+08 rays/sec no matter how much more theoretical performance the card has. Increasing the number of rays fired up from 80M seems to have no positive impact on this performance metric.
References
[1] Valentine, A., Berry, T., Bradnam, S., Hagues, J., & Hodson, J. (2022). Benchmarking of emergent radiation transport codes for fusion neutronics applications. Fusion Engineering and Design, 180, 113197. https://doi.org/10.1016/j.fusengdes.2022.113197