[PR4] Ray-benchmark tool to test throughput of batch query API added in #195 by Waqar-ukaea · Pull Request #194 · xdg-org/xdg

Waqar-ukaea · 2026-01-30T13:11:25Z

This PR adds a new tool/miniapp that I have been using to benchmark the batch query API pure ray throughput on various GPUs.

This PR adds tools/ray_benchmark/ which includes:

ray_benchmark.cpp - The main application
ray_benchmark.h - Header file containing some helper functions including definition of the callback method which "mocks" an external application filling XDG's internal ray buffers
ray_benchmark_deviceCode.slang - Slang compute shader for filling XDG's internal ray buffer on device
ray_benchmark_shared.h - Shared code between host and device
ray_benchmark_driver.py - A python script used to drive multiple runs of the ray_benchmark miniapp for more consistent results measurement.

Copied from #178:

Benchmark parameters

model	Volume	No. of Elements	No. of Rays	Location	No. of Runs
simple_tokamak	2	280K	50M	(180,250,-27)	100

A render of the simple_tokamak model [1] used in these preliminary benchmarks along with a depiction (significantly smaller number of rays plotted) of the rays launched is shown in the image below, along with the volume queried against highlghted in blue:

Ray tracer performance (trace-only)

Baseline = Embree (CPU), 2× Intel® Xeon® Platinum 8480+ (Sapphire Rapids) × 112 threads

Times and throughput averaged over the 100 runs.

Ray Tracer backend	Hardware (Threads / Device)	Trace Time (s)	Throughput (ray/s)	Speedup vs 2×8480+ (112-thread)	Peak FP32/FP64 (TFLOPS) + RT cores	Node %
Embree	13th Gen Intel® Core™ i7-13850HX × 28 threads	0.749128	1.06791e+08	~0.25×	N/A	Not HPC
Embree	1× Intel® Xeon® Platinum 8480+ (Sapphire Rapids) × 56 threads	0.367881	2.17462e+08	~0.51×	N/A	50%
Embree	2× Intel® Xeon® Platinum 8480+ (Sapphire Rapids) × 112 threads	0.18914	4.22967e+08	1× (baseline)	N/A	100%
GPRT (FP64)	NVIDIA RTX 2000 Ada	1.11261	7.19033e+07	~0.17× (baseline faster)	FP32: 12.0 FP64: 0.19 RT cores: 22	Not HPC
GPRT (FP32 + RT cores)	NVIDIA RTX 2000 Ada	0.0302956	2.64065e+09	~6.24×	FP32: 12.0 FP64: 0.19 RT cores: 22	Not HPC
GPRT (FP64)	NVIDIA L40	0.185930	4.319e+08	~1.02×	FP32: 90.5 FP64: 1.41 RT cores: 142	25%
GPRT (FP32 + RT cores)	NVIDIA L40	0.008051	9.954e+09	~23.5×	FP32: 90.5 FP64: 1.41 RT cores: 142	25%
GPRT (FP64)	NVIDIA A100	0.183721	4.506e+08	~1.07×	FP32: 19.5 FP64: 9.7 RT cores: N/A	25%
GPRT (FP64)	AMD MI300X - No VK ray tracing support	N/A	N/A	N/A	N/A	N/A
GPRT (FP64)	Intel Data Center GPU Max 1100 (Ponte Vecchio) - Vulkan doesn’t recognise PVCs as physical devices	N/A	N/A	N/A	N/A	N/A

Next Steps - It's probably worth coming up with a more computationally intense benchmark problem. For this simple ray-throughput case, it might just have to be increasing the number of rays - however I am memory bound for the RTX 2000 Ada which only has 8GB, so I'll have to have a think what could be more suitable.

Performance seems to be capping out at ~4e+08 rays/sec no matter how much more theoretical performance the card has. Increasing the number of rays fired up from 80M seems to have no positive impact on this performance metric.

References

[1] Valentine, A., Berry, T., Bradnam, S., Hagues, J., & Hodson, J. (2022). Benchmarking of emergent radiation transport codes for fusion neutronics applications. Fusion Engineering and Design, 180, 113197. https://doi.org/10.1016/j.fusengdes.2022.113197

… different GPUs

Added ray-benchmark tool to test implementation ray throughput across…

0eefffe

… different GPUs

This was referenced Jan 30, 2026

Adding a ray batch query API #178

Draft

Core API changes to allow batch queries with XDG (PR 1) #191

Closed

[PR3] Associated extra debugging tools for batch query API added in #195 #193

Open

Waqar-ukaea changed the title ~~Ray-benchmark tool to test throughput of batch query API added in #191 (PR 4)~~ [PR4] Ray-benchmark tool to test throughput of batch query API added in #191 Feb 4, 2026

Waqar-ukaea changed the title ~~[PR4] Ray-benchmark tool to test throughput of batch query API added in #191~~ [PR4] Ray-benchmark tool to test throughput of batch query API added in #194 Feb 4, 2026

Waqar-ukaea changed the title ~~[PR4] Ray-benchmark tool to test throughput of batch query API added in #194~~ [PR4] Ray-benchmark tool to test throughput of batch query API added in #195 Feb 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PR4] Ray-benchmark tool to test throughput of batch query API added in #195#194

[PR4] Ray-benchmark tool to test throughput of batch query API added in #195#194
Waqar-ukaea wants to merge 1 commit intoxdg-org:mainfrom
Waqar-ukaea:pr4-batch-query-benchmark-tool

Waqar-ukaea commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Waqar-ukaea commented Jan 30, 2026

Benchmark parameters

Ray tracer performance (trace-only)

References

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant