Skip to content

Add native Dask+CuPy backends for hydrology core functions (#952)#966

Merged
brendancol merged 1 commit intomasterfrom
issue-952
Mar 4, 2026
Merged

Add native Dask+CuPy backends for hydrology core functions (#952)#966
brendancol merged 1 commit intomasterfrom
issue-952

Conversation

@brendancol
Copy link
Contributor

Summary

  • Replaces CPU fallback with native GPU tile kernels for six hydrology functions (flow_accumulation, watershed, basin, stream_order, stream_link, snap_pour_point) when running on Dask+CuPy arrays
  • Each tile now runs existing CUDA kernels directly with seed injection at boundaries, keeping data GPU-resident through the iterative tile sweep
  • Adds a native CUDA kernel for snap_pour_point's single-GPU CuPy path (previously fell back to CPU)
  • Updates README feature matrix: all six functions now show native support across all four backends

What changed

Per-tile GPU kernels with seed injection: The Dask+CuPy path previously converted CuPy chunks to NumPy, ran the CPU tile kernel, then converted back. Now each tile runs the same GPU frontier-peeling kernels used by the single-GPU path, with external boundary values injected before the peeling loop starts. Seeds are transferred CPU-side only at tile boundaries (small O(edge_length) strips), while all tile-interior computation stays on GPU.

snap_pour_point native CuPy: Added _snap_pour_point_gpu CUDA kernel where each thread handles one pour point's windowed max search. The flow accumulation array stays on GPU instead of being pulled to CPU.

stream_link tile-aware kernel: Added _stream_link_find_ready_tile CUDA kernel that uses global coordinate offsets for position-based link IDs, so tile-local results are consistent with full-array results.

Test plan

  • All 158 existing + new tests pass for all six modules
  • New dask+cupy tests with multiple chunk sizes and random acyclic grids for each function
  • Verified dask+cupy output matches dask+numpy output exactly for basin (pre-existing tile-sweep convergence issue affects both backends identically)

@github-actions github-actions bot added the performance PR touches performance-sensitive code label Mar 4, 2026
Replace CPU fallback with native GPU tile kernels for flow_accumulation,
watershed, basin, stream_order, stream_link, and snap_pour_point when
running on Dask+CuPy arrays. Each function now runs its existing CUDA
kernels per-tile with seed injection at tile boundaries, keeping data
GPU-resident throughout the iterative tile sweep. Also adds a native
CUDA kernel for snap_pour_point's single-GPU CuPy path.
@brendancol brendancol merged commit 7cad73b into master Mar 4, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant