Add native Dask+CuPy backends for hydrology core functions (#952)#966
Merged
brendancol merged 1 commit intomasterfrom Mar 4, 2026
Merged
Add native Dask+CuPy backends for hydrology core functions (#952)#966brendancol merged 1 commit intomasterfrom
brendancol merged 1 commit intomasterfrom
Conversation
Replace CPU fallback with native GPU tile kernels for flow_accumulation, watershed, basin, stream_order, stream_link, and snap_pour_point when running on Dask+CuPy arrays. Each function now runs its existing CUDA kernels per-tile with seed injection at tile boundaries, keeping data GPU-resident throughout the iterative tile sweep. Also adds a native CUDA kernel for snap_pour_point's single-GPU CuPy path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
flow_accumulation,watershed,basin,stream_order,stream_link,snap_pour_point) when running on Dask+CuPy arrayssnap_pour_point's single-GPU CuPy path (previously fell back to CPU)What changed
Per-tile GPU kernels with seed injection: The Dask+CuPy path previously converted CuPy chunks to NumPy, ran the CPU tile kernel, then converted back. Now each tile runs the same GPU frontier-peeling kernels used by the single-GPU path, with external boundary values injected before the peeling loop starts. Seeds are transferred CPU-side only at tile boundaries (small O(edge_length) strips), while all tile-interior computation stays on GPU.
snap_pour_point native CuPy: Added
_snap_pour_point_gpuCUDA kernel where each thread handles one pour point's windowed max search. The flow accumulation array stays on GPU instead of being pulled to CPU.stream_link tile-aware kernel: Added
_stream_link_find_ready_tileCUDA kernel that uses global coordinate offsets for position-based link IDs, so tile-local results are consistent with full-array results.Test plan