Skip to content

Commit f97050e

Browse files
authored
Add preview() for memory-safe raster downsampling (#987)
* Add preview() for memory-safe downsampling of large rasters (#986) Uses xarray coarsen with block averaging for numpy and dask backends, stride-based subsampling for CuPy. Dask arrays stay lazy so peak memory is bounded by the largest chunk plus the output. Accepts DataArray or Dataset via @supports_dataset decorator. * Add tests for preview() across all backends (#986) Covers numpy, dask, cupy, dask+cupy, Dataset, NaN handling, block averaging correctness, passthrough for small rasters, input validation, and accessor integration. * Add preview() to API reference docs (#986) * Add preview() user guide notebook (#986) * Add bilinear and nearest downsample methods to preview() (#986)
1 parent 521e9f8 commit f97050e

File tree

6 files changed

+554
-0
lines changed

6 files changed

+554
-0
lines changed

docs/source/reference/utilities.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,13 @@ Contours
3232

3333
xrspatial.contour.contours
3434

35+
Preview
36+
=======
37+
.. autosummary::
38+
:toctree: _autosummary
39+
40+
xrspatial.preview.preview
41+
3542
Diagnostics
3643
===========
3744
.. autosummary::
Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "rt7pp4omcq",
6+
"source": "# Preview: memory-safe thumbnails of large rasters\n\nWhen a raster is backed by dask (e.g. loaded lazily from Zarr or a stack of GeoTIFFs),\ncalling `.compute()` to visualize it can blow up your memory. `xrspatial.preview()`\ndownsamples the data to a target pixel size using block averaging, and the whole\noperation stays lazy until you ask for the result. Peak memory is bounded by\nthe largest chunk plus the small output array.\n\nThis notebook generates a 1 TB dask-backed terrain raster and previews it at\n1000x1000 pixels. A `dask.distributed` LocalCluster is started so you can\nwatch the task graph and worker memory in the dashboard.",
7+
"metadata": {}
8+
},
9+
{
10+
"cell_type": "code",
11+
"id": "ivhk3f6ui7",
12+
"source": "import numpy as np\nimport xarray as xr\nimport dask.array as da\nimport matplotlib.pyplot as plt\n\nimport xrspatial\nfrom xrspatial import generate_terrain, preview",
13+
"metadata": {},
14+
"execution_count": null,
15+
"outputs": []
16+
},
17+
{
18+
"cell_type": "code",
19+
"id": "lb7wkq291z",
20+
"source": "from dask.distributed import Client, LocalCluster\n\ncluster = LocalCluster(n_workers=4, threads_per_worker=2, memory_limit=\"2GB\")\nclient = Client(cluster)\nprint(f\"Dashboard: {client.dashboard_link}\")\nclient",
21+
"metadata": {},
22+
"execution_count": null,
23+
"outputs": []
24+
},
25+
{
26+
"cell_type": "markdown",
27+
"id": "ouvgm7ttw1",
28+
"source": "## Generate a terrain tile\n\nFirst, create a 1024x1024 terrain tile using `generate_terrain`. This is the\nbuilding block we'll replicate into a massive dask array.",
29+
"metadata": {}
30+
},
31+
{
32+
"cell_type": "code",
33+
"id": "yts07v5mgv9",
34+
"source": "# 1024x1024 in-memory terrain tile\ncanvas = xr.DataArray(np.zeros((1024, 1024), dtype=np.float32), dims=[\"y\", \"x\"])\ntile = generate_terrain(canvas, seed=12345)\n\nfig, ax = plt.subplots(figsize=(6, 6))\ntile.plot(ax=ax, cmap=\"terrain\")\nax.set_title(f\"Terrain tile ({tile.shape[0]}x{tile.shape[1]}, {tile.nbytes / 1e6:.1f} MB)\")\nax.set_aspect(\"equal\")\nplt.tight_layout()",
35+
"metadata": {},
36+
"execution_count": null,
37+
"outputs": []
38+
},
39+
{
40+
"cell_type": "markdown",
41+
"id": "mc23kw3w94",
42+
"source": "## Tile it into a 1 TB dask array\n\nWe replicate the tile 512x512 times using `dask.array.tile` to get a\n524,288 x 524,288 raster. At float32 that's 1.1 TB of data. Nothing is\nactually computed here -- dask just records the tiling as a lazy graph.",
43+
"metadata": {}
44+
},
45+
{
46+
"cell_type": "code",
47+
"id": "ire1hxtder",
48+
"source": "# Tile the small terrain into a ~1 TB dask array\nreps = 512\nbig_dask = da.tile(\n da.from_array(tile.values, chunks=(1024, 1024)),\n (reps, reps),\n)\nrows, cols = big_dask.shape\nbig = xr.DataArray(\n big_dask,\n dims=[\"y\", \"x\"],\n coords={\"y\": np.arange(rows, dtype=np.float64), \"x\": np.arange(cols, dtype=np.float64)},\n)\n\nprint(f\"Shape: {big.shape[0]:,} x {big.shape[1]:,}\")\nprint(f\"Chunk size: {big_dask.chunksize}\")\nprint(f\"Num chunks: {big_dask.numblocks}\")\nprint(f\"Total size: {big_dask.nbytes / 1e12:.2f} TB\")\nprint(f\"Dtype: {big_dask.dtype}\")",
49+
"metadata": {},
50+
"execution_count": null,
51+
"outputs": []
52+
},
53+
{
54+
"cell_type": "markdown",
55+
"id": "3n94gc0t1tg",
56+
"source": "## Preview at 1000x1000\n\n`preview()` builds a lazy coarsen-then-mean graph. Calling `.compute()` on the\nresult materializes only the 1000x1000 output -- about 4 MB.",
57+
"metadata": {}
58+
},
59+
{
60+
"cell_type": "code",
61+
"id": "skqz0wfgial",
62+
"source": "%%time\nsmall = preview(big, width=1000).compute()\n\nprint(f\"Output shape: {small.shape}\")\nprint(f\"Output size: {small.nbytes / 1e6:.1f} MB\")",
63+
"metadata": {},
64+
"execution_count": null,
65+
"outputs": []
66+
},
67+
{
68+
"cell_type": "code",
69+
"id": "2jif06ajupn",
70+
"source": "fig, ax = plt.subplots(figsize=(8, 8))\nsmall.plot(ax=ax, cmap=\"terrain\")\nax.set_title(f\"1000x1000 preview of a {big_dask.nbytes / 1e12:.1f} TB raster\")\nax.set_aspect(\"equal\")\nplt.tight_layout()",
71+
"metadata": {},
72+
"execution_count": null,
73+
"outputs": []
74+
},
75+
{
76+
"cell_type": "markdown",
77+
"id": "nrbcb74q9oa",
78+
"source": "## Different preview sizes\n\nYou can control both width and height. Omitting height preserves the aspect ratio.",
79+
"metadata": {}
80+
},
81+
{
82+
"cell_type": "code",
83+
"id": "mqzjqxdvj4",
84+
"source": "fig, axes = plt.subplots(1, 3, figsize=(14, 4))\nfor ax, w in zip(axes, [100, 500, 2000]):\n p = preview(big, width=w).compute()\n p.plot(ax=ax, cmap=\"terrain\", add_colorbar=False)\n ax.set_title(f\"{p.shape[0]}x{p.shape[1]} ({p.nbytes / 1e6:.1f} MB)\")\n ax.set_aspect(\"equal\")\nplt.tight_layout()",
85+
"metadata": {},
86+
"execution_count": null,
87+
"outputs": []
88+
},
89+
{
90+
"cell_type": "markdown",
91+
"id": "82h89j8n7em",
92+
"source": "## Accessor syntax\n\nYou can also call `preview` directly on a DataArray or Dataset via the `.xrs` accessor.",
93+
"metadata": {}
94+
},
95+
{
96+
"cell_type": "code",
97+
"id": "jastfcpb3i",
98+
"source": "# Accessor on a DataArray\nsmall = big.xrs.preview(width=500).compute()\nprint(f\"DataArray accessor: {small.shape}\")\n\n# Accessor on a Dataset\nds = xr.Dataset({\"elevation\": big, \"slope_proxy\": big * 0.1})\nsmall_ds = ds.xrs.preview(width=500)\nfor name, var in small_ds.data_vars.items():\n print(f\"Dataset var '{name}': {var.shape}\")",
99+
"metadata": {},
100+
"execution_count": null,
101+
"outputs": []
102+
},
103+
{
104+
"cell_type": "code",
105+
"id": "f2s7vgc81u5",
106+
"source": "client.close()\ncluster.close()",
107+
"metadata": {},
108+
"execution_count": null,
109+
"outputs": []
110+
}
111+
],
112+
"metadata": {
113+
"kernelspec": {
114+
"display_name": "Python 3",
115+
"language": "python",
116+
"name": "python3"
117+
},
118+
"language_info": {
119+
"name": "python",
120+
"version": "3.10.0"
121+
}
122+
},
123+
"nbformat": 4,
124+
"nbformat_minor": 5
125+
}

xrspatial/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@
6565
from xrspatial.pathfinding import a_star_search # noqa
6666
from xrspatial.pathfinding import multi_stop_search # noqa
6767
from xrspatial.perlin import perlin # noqa
68+
from xrspatial.preview import preview # noqa
6869
from xrspatial.proximity import allocation # noqa
6970
from xrspatial.proximity import direction # noqa
7071
from xrspatial.proximity import euclidean_distance # noqa

xrspatial/accessor.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -315,6 +315,12 @@ def spline(self, x, y, z, **kwargs):
315315
from .interpolate import spline
316316
return spline(x, y, z, self._obj, **kwargs)
317317

318+
# ---- Preview ----
319+
320+
def preview(self, **kwargs):
321+
from .preview import preview
322+
return preview(self._obj, **kwargs)
323+
318324
# ---- Raster to vector ----
319325

320326
def polygonize(self, **kwargs):
@@ -619,6 +625,12 @@ def surface_direction(self, elevation, **kwargs):
619625
from .surface_distance import surface_direction
620626
return surface_direction(self._obj, elevation, **kwargs)
621627

628+
# ---- Preview ----
629+
630+
def preview(self, **kwargs):
631+
from .preview import preview
632+
return preview(self._obj, **kwargs)
633+
622634
# ---- Fire ----
623635

624636
def burn_severity_class(self, **kwargs):

xrspatial/preview.py

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
"""Memory-safe raster preview via downsampling."""
2+
3+
import numpy as np
4+
import xarray as xr
5+
6+
from xrspatial.dataset_support import supports_dataset
7+
from xrspatial.utils import (
8+
_validate_raster,
9+
has_cuda_and_cupy,
10+
is_cupy_array,
11+
)
12+
13+
_METHODS = ('mean', 'nearest', 'bilinear')
14+
15+
16+
def _bilinear_numpy(data, out_h, out_w):
17+
"""Bilinear interpolation on a 2D numpy array."""
18+
from scipy.ndimage import zoom
19+
20+
zoom_y = out_h / data.shape[0]
21+
zoom_x = out_w / data.shape[1]
22+
return zoom(data, (zoom_y, zoom_x), order=1)
23+
24+
25+
def _bilinear_cupy(data, out_h, out_w):
26+
"""Bilinear interpolation on a 2D cupy array."""
27+
import cupy
28+
from cupyx.scipy.ndimage import zoom
29+
30+
zoom_y = out_h / data.shape[0]
31+
zoom_x = out_w / data.shape[1]
32+
return zoom(data, (zoom_y, zoom_x), order=1)
33+
34+
35+
@supports_dataset
36+
def preview(agg, width=1000, height=None, method='mean', name='preview'):
37+
"""Downsample a raster to target pixel dimensions.
38+
39+
For dask-backed arrays, the operation is lazy: each chunk is reduced
40+
independently, so peak memory is bounded by the largest chunk plus
41+
the small output array. A 30 TB raster can be previewed at
42+
1000x1000 with only a few MB of RAM.
43+
44+
Parameters
45+
----------
46+
agg : xr.DataArray
47+
Input raster (2D).
48+
width : int, default 1000
49+
Target width in pixels.
50+
height : int, optional
51+
Target height in pixels. If not provided, computed from *width*
52+
preserving the aspect ratio of *agg*.
53+
method : str, default 'mean'
54+
Downsampling method. One of:
55+
56+
- ``'mean'``: block averaging via ``xarray.coarsen``.
57+
- ``'nearest'``: stride-based subsampling (fastest, no smoothing).
58+
- ``'bilinear'``: bilinear interpolation via ``scipy.ndimage.zoom``.
59+
name : str, default 'preview'
60+
Name for the output DataArray.
61+
62+
Returns
63+
-------
64+
xr.DataArray
65+
Downsampled raster with updated coordinates.
66+
"""
67+
_validate_raster(agg, func_name='preview', ndim=2)
68+
69+
if method not in _METHODS:
70+
raise ValueError(
71+
f"method must be one of {_METHODS!r}, got {method!r}"
72+
)
73+
74+
h = agg.sizes[agg.dims[0]]
75+
w = agg.sizes[agg.dims[1]]
76+
77+
if height is None:
78+
height = max(1, round(width * h / w))
79+
80+
factor_y = max(1, h // height)
81+
factor_x = max(1, w // width)
82+
83+
if factor_y <= 1 and factor_x <= 1:
84+
return agg
85+
86+
y_dim = agg.dims[0]
87+
x_dim = agg.dims[1]
88+
89+
out_h = h // factor_y
90+
out_w = w // factor_x
91+
92+
if method == 'nearest':
93+
result = agg.isel(
94+
{y_dim: slice(None, None, factor_y),
95+
x_dim: slice(None, None, factor_x)}
96+
)
97+
elif method == 'bilinear':
98+
result = _preview_bilinear(agg, out_h, out_w, y_dim, x_dim)
99+
else:
100+
# method == 'mean'
101+
if has_cuda_and_cupy() and is_cupy_array(agg.data):
102+
# xarray coarsen has edge cases with cupy; fall back to nearest
103+
result = agg.isel(
104+
{y_dim: slice(None, None, factor_y),
105+
x_dim: slice(None, None, factor_x)}
106+
)
107+
else:
108+
result = agg.coarsen(
109+
{y_dim: factor_y, x_dim: factor_x}, boundary='trim'
110+
).mean()
111+
112+
result.name = name
113+
return result
114+
115+
116+
def _preview_bilinear(agg, out_h, out_w, y_dim, x_dim):
117+
"""Apply bilinear interpolation, handling numpy/cupy/dask backends."""
118+
import dask.array as da
119+
120+
if isinstance(agg.data, da.Array):
121+
# For dask: use map_blocks with a wrapper that resizes each block,
122+
# then concatenate. Simpler approach: compute target coords and
123+
# use xarray interp (which handles dask natively).
124+
y_coords = agg.coords[y_dim]
125+
x_coords = agg.coords[x_dim]
126+
new_y = np.linspace(
127+
float(y_coords[0]), float(y_coords[-1]), out_h
128+
)
129+
new_x = np.linspace(
130+
float(x_coords[0]), float(x_coords[-1]), out_w
131+
)
132+
result = agg.interp(
133+
{y_dim: new_y, x_dim: new_x}, method='linear'
134+
)
135+
elif has_cuda_and_cupy() and is_cupy_array(agg.data):
136+
out_data = _bilinear_cupy(agg.data, out_h, out_w)
137+
y_coords = agg.coords[y_dim].values
138+
x_coords = agg.coords[x_dim].values
139+
new_y = np.linspace(y_coords[0], y_coords[-1], out_h)
140+
new_x = np.linspace(x_coords[0], x_coords[-1], out_w)
141+
result = xr.DataArray(
142+
out_data,
143+
dims=[y_dim, x_dim],
144+
coords={y_dim: new_y, x_dim: new_x},
145+
attrs=agg.attrs,
146+
)
147+
else:
148+
out_data = _bilinear_numpy(agg.data, out_h, out_w)
149+
y_coords = agg.coords[y_dim].values
150+
x_coords = agg.coords[x_dim].values
151+
new_y = np.linspace(y_coords[0], y_coords[-1], out_h)
152+
new_x = np.linspace(x_coords[0], x_coords[-1], out_w)
153+
result = xr.DataArray(
154+
out_data,
155+
dims=[y_dim, x_dim],
156+
coords={y_dim: new_y, x_dim: new_x},
157+
attrs=agg.attrs,
158+
)
159+
return result

0 commit comments

Comments
 (0)