Achieving 1 Billion Row Spatial Joins on a Single GPU with H3-Turbo (SYCL) #1137

cflockhart · 2026-03-06T04:18:59Z

cflockhart
Mar 6, 2026

Hi everyone,

I wanted to share an exciting milestone we’ve reached with a project we've been working on called h3-turbo.

We set out to see how far we could push H3's performance by moving core operations entirely to the GPU. To benchmark this, we replicated Query 11 from SpatialBench (a massive Point-in-Polygon spatial join commonly used to benchmark distributed systems like Apache Sedona).

The Benchmark:

Dataset: 1.1 Billion points (pings) joined against 100 Million polygons (zones).
Hardware: A single consumer-grade GPU node (e.g.,RTX 4090 with 24GB VRAM and>20GB System RAM on RunPod).
Framework: Python, utilizing h3-turbo with dynamic GPU batching to prevent OOM errors.
The Result: Instead of spinning up a multi-node Spark cluster with massive memory and S3 overhead, we successfully executed the entire 1.1B x 100M spatial join on a single machine.

How it works under the hood:

SYCL / AdaptiveCpp: We ported core H3 logic (like latlng_to_cell and cell_to_parent) into SYCL kernels. This allows the code to run natively on NVIDIA, AMD, and Intel GPUs from a single codebase, although it's currently only been tested on Nvidia due to the scarcity of AMD and Intel on sites like Runpod. We can create builds for AMD and Intel for anyone who wants to try on those architectures.

Zero-Copy Python Bindings: The library takes standard NumPy arrays (uint64) directly from Python and pushes them to the GPU.
PySpark Pandas UDFs: For those already using Databricks or Spark, we built vectorized Pandas UDF wrappers (e.g., spatial_join_udf, latlng_to_cell_udf) that map distributed DataFrame partitions directly onto the GPU hardware.

If anyone is dealing with massive-scale geospatial joins or heavy H3 aggregations and wants to dramatically reduce their compute footprint (moving from a cluster to a single GPU), I'd love to hear your thoughts or use cases!
Sample Jupyter notebooks with results at:

https://github.com/cflockhart/fluidgeo_web
Docker images with Jupyter lab for Nvidia sm_89 up to sm_120: docker.io/cflockhart/h3_turbo_sm_89:latest. docker.io/cflockhart/h3_turbo_sm_90:latest etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Achieving 1 Billion Row Spatial Joins on a Single GPU with H3-Turbo (SYCL) #1137

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Achieving 1 Billion Row Spatial Joins on a Single GPU with H3-Turbo (SYCL) #1137

Uh oh!

cflockhart Mar 6, 2026

Replies: 0 comments

cflockhart
Mar 6, 2026