Add preview() for memory-safe raster downsampling (#987)

brendancol · web-flow · commit f97050e321f4 · 2026-03-06T08:52:33.000-08:00
* Add preview() for memory-safe downsampling of large rasters (#986) Uses xarray coarsen with block averaging for numpy and dask backends, stride-based subsampling for CuPy. Dask arrays stay lazy so peak memory is bounded by the largest chunk plus the output. Accepts DataArray or Dataset via @supports_dataset decorator. * Add tests for preview() across all backends (#986) Covers numpy, dask, cupy, dask+cupy, Dataset, NaN handling, block averaging correctness, passthrough for small rasters, input validation, and accessor integration. * Add preview() to API reference docs (#986) * Add preview() user guide notebook (#986) * Add bilinear and nearest downsample methods to preview() (#986)
diff --git a/docs/source/reference/utilities.rst b/docs/source/reference/utilities.rst
@@ -32,6 +32,13 @@ Contours
 
     xrspatial.contour.contours
 
+Preview
+=======
+.. autosummary::
+    :toctree: _autosummary
+
+    xrspatial.preview.preview
+
 Diagnostics
 ===========
 .. autosummary::
diff --git a/examples/user_guide/26_Preview.ipynb b/examples/user_guide/26_Preview.ipynb
@@ -0,0 +1,125 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "rt7pp4omcq",
+   "source": "# Preview: memory-safe thumbnails of large rasters\n\nWhen a raster is backed by dask (e.g. loaded lazily from Zarr or a stack of GeoTIFFs),\ncalling `.compute()` to visualize it can blow up your memory.  `xrspatial.preview()`\ndownsamples the data to a target pixel size using block averaging, and the whole\noperation stays lazy until you ask for the result.  Peak memory is bounded by\nthe largest chunk plus the small output array.\n\nThis notebook generates a 1 TB dask-backed terrain raster and previews it at\n1000x1000 pixels.  A `dask.distributed` LocalCluster is started so you can\nwatch the task graph and worker memory in the dashboard.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "ivhk3f6ui7",
+   "source": "import numpy as np\nimport xarray as xr\nimport dask.array as da\nimport matplotlib.pyplot as plt\n\nimport xrspatial\nfrom xrspatial import generate_terrain, preview",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "code",
+   "id": "lb7wkq291z",
+   "source": "from dask.distributed import Client, LocalCluster\n\ncluster = LocalCluster(n_workers=4, threads_per_worker=2, memory_limit=\"2GB\")\nclient = Client(cluster)\nprint(f\"Dashboard: {client.dashboard_link}\")\nclient",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ouvgm7ttw1",
+   "source": "## Generate a terrain tile\n\nFirst, create a 1024x1024 terrain tile using `generate_terrain`.  This is the\nbuilding block we'll replicate into a massive dask array.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "yts07v5mgv9",
+   "source": "# 1024x1024 in-memory terrain tile\ncanvas = xr.DataArray(np.zeros((1024, 1024), dtype=np.float32), dims=[\"y\", \"x\"])\ntile = generate_terrain(canvas, seed=12345)\n\nfig, ax = plt.subplots(figsize=(6, 6))\ntile.plot(ax=ax, cmap=\"terrain\")\nax.set_title(f\"Terrain tile ({tile.shape[0]}x{tile.shape[1]}, {tile.nbytes / 1e6:.1f} MB)\")\nax.set_aspect(\"equal\")\nplt.tight_layout()",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "mc23kw3w94",
+   "source": "## Tile it into a 1 TB dask array\n\nWe replicate the tile 512x512 times using `dask.array.tile` to get a\n524,288 x 524,288 raster.  At float32 that's 1.1 TB of data.  Nothing is\nactually computed here -- dask just records the tiling as a lazy graph.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "ire1hxtder",
+   "source": "# Tile the small terrain into a ~1 TB dask array\nreps = 512\nbig_dask = da.tile(\n    da.from_array(tile.values, chunks=(1024, 1024)),\n    (reps, reps),\n)\nrows, cols = big_dask.shape\nbig = xr.DataArray(\n    big_dask,\n    dims=[\"y\", \"x\"],\n    coords={\"y\": np.arange(rows, dtype=np.float64), \"x\": np.arange(cols, dtype=np.float64)},\n)\n\nprint(f\"Shape:      {big.shape[0]:,} x {big.shape[1]:,}\")\nprint(f\"Chunk size: {big_dask.chunksize}\")\nprint(f\"Num chunks: {big_dask.numblocks}\")\nprint(f\"Total size: {big_dask.nbytes / 1e12:.2f} TB\")\nprint(f\"Dtype:      {big_dask.dtype}\")",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3n94gc0t1tg",
+   "source": "## Preview at 1000x1000\n\n`preview()` builds a lazy coarsen-then-mean graph.  Calling `.compute()` on the\nresult materializes only the 1000x1000 output -- about 4 MB.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "skqz0wfgial",
+   "source": "%%time\nsmall = preview(big, width=1000).compute()\n\nprint(f\"Output shape: {small.shape}\")\nprint(f\"Output size:  {small.nbytes / 1e6:.1f} MB\")",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "code",
+   "id": "2jif06ajupn",
+   "source": "fig, ax = plt.subplots(figsize=(8, 8))\nsmall.plot(ax=ax, cmap=\"terrain\")\nax.set_title(f\"1000x1000 preview of a {big_dask.nbytes / 1e12:.1f} TB raster\")\nax.set_aspect(\"equal\")\nplt.tight_layout()",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "nrbcb74q9oa",
+   "source": "## Different preview sizes\n\nYou can control both width and height.  Omitting height preserves the aspect ratio.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "mqzjqxdvj4",
+   "source": "fig, axes = plt.subplots(1, 3, figsize=(14, 4))\nfor ax, w in zip(axes, [100, 500, 2000]):\n    p = preview(big, width=w).compute()\n    p.plot(ax=ax, cmap=\"terrain\", add_colorbar=False)\n    ax.set_title(f\"{p.shape[0]}x{p.shape[1]} ({p.nbytes / 1e6:.1f} MB)\")\n    ax.set_aspect(\"equal\")\nplt.tight_layout()",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82h89j8n7em",
+   "source": "## Accessor syntax\n\nYou can also call `preview` directly on a DataArray or Dataset via the `.xrs` accessor.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "jastfcpb3i",
+   "source": "# Accessor on a DataArray\nsmall = big.xrs.preview(width=500).compute()\nprint(f\"DataArray accessor: {small.shape}\")\n\n# Accessor on a Dataset\nds = xr.Dataset({\"elevation\": big, \"slope_proxy\": big * 0.1})\nsmall_ds = ds.xrs.preview(width=500)\nfor name, var in small_ds.data_vars.items():\n    print(f\"Dataset var '{name}': {var.shape}\")",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "code",
+   "id": "f2s7vgc81u5",
+   "source": "client.close()\ncluster.close()",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.10.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/xrspatial/__init__.py b/xrspatial/__init__.py
@@ -65,6 +65,7 @@
 from xrspatial.pathfinding import a_star_search  # noqa
 from xrspatial.pathfinding import multi_stop_search  # noqa
 from xrspatial.perlin import perlin  # noqa
+from xrspatial.preview import preview  # noqa
 from xrspatial.proximity import allocation  # noqa
 from xrspatial.proximity import direction  # noqa
 from xrspatial.proximity import euclidean_distance  # noqa
diff --git a/xrspatial/accessor.py b/xrspatial/accessor.py
@@ -315,6 +315,12 @@ def spline(self, x, y, z, **kwargs):
         from .interpolate import spline
         return spline(x, y, z, self._obj, **kwargs)
 
+    # ---- Preview ----
+
+    def preview(self, **kwargs):
+        from .preview import preview
+        return preview(self._obj, **kwargs)
+
     # ---- Raster to vector ----
 
     def polygonize(self, **kwargs):
@@ -619,6 +625,12 @@ def surface_direction(self, elevation, **kwargs):
         from .surface_distance import surface_direction
         return surface_direction(self._obj, elevation, **kwargs)
 
+    # ---- Preview ----
+
+    def preview(self, **kwargs):
+        from .preview import preview
+        return preview(self._obj, **kwargs)
+
     # ---- Fire ----
 
     def burn_severity_class(self, **kwargs):
diff --git a/xrspatial/preview.py b/xrspatial/preview.py
@@ -0,0 +1,159 @@
+"""Memory-safe raster preview via downsampling."""
+
+import numpy as np
+import xarray as xr
+
+from xrspatial.dataset_support import supports_dataset
+from xrspatial.utils import (
+    _validate_raster,
+    has_cuda_and_cupy,
+    is_cupy_array,
+)
+
+_METHODS = ('mean', 'nearest', 'bilinear')
+
+
+def _bilinear_numpy(data, out_h, out_w):
+    """Bilinear interpolation on a 2D numpy array."""
+    from scipy.ndimage import zoom
+
+    zoom_y = out_h / data.shape[0]
+    zoom_x = out_w / data.shape[1]
+    return zoom(data, (zoom_y, zoom_x), order=1)
+
+
+def _bilinear_cupy(data, out_h, out_w):
+    """Bilinear interpolation on a 2D cupy array."""
+    import cupy
+    from cupyx.scipy.ndimage import zoom
+
+    zoom_y = out_h / data.shape[0]
+    zoom_x = out_w / data.shape[1]
+    return zoom(data, (zoom_y, zoom_x), order=1)
+
+
+@supports_dataset
+def preview(agg, width=1000, height=None, method='mean', name='preview'):
+    """Downsample a raster to target pixel dimensions.
+
+    For dask-backed arrays, the operation is lazy: each chunk is reduced
+    independently, so peak memory is bounded by the largest chunk plus
+    the small output array.  A 30 TB raster can be previewed at
+    1000x1000 with only a few MB of RAM.
+
+    Parameters
+    ----------
+    agg : xr.DataArray
+        Input raster (2D).
+    width : int, default 1000
+        Target width in pixels.
+    height : int, optional
+        Target height in pixels.  If not provided, computed from *width*
+        preserving the aspect ratio of *agg*.
+    method : str, default 'mean'
+        Downsampling method.  One of:
+
+        - ``'mean'``: block averaging via ``xarray.coarsen``.
+        - ``'nearest'``: stride-based subsampling (fastest, no smoothing).
+        - ``'bilinear'``: bilinear interpolation via ``scipy.ndimage.zoom``.
+    name : str, default 'preview'
+        Name for the output DataArray.
+
+    Returns
+    -------
+    xr.DataArray
+        Downsampled raster with updated coordinates.
+    """
+    _validate_raster(agg, func_name='preview', ndim=2)
+
+    if method not in _METHODS:
+        raise ValueError(
+            f"method must be one of {_METHODS!r}, got {method!r}"
+        )
+
+    h = agg.sizes[agg.dims[0]]
+    w = agg.sizes[agg.dims[1]]
+
+    if height is None:
+        height = max(1, round(width * h / w))
+
+    factor_y = max(1, h // height)
+    factor_x = max(1, w // width)
+
+    if factor_y <= 1 and factor_x <= 1:
+        return agg
+
+    y_dim = agg.dims[0]
+    x_dim = agg.dims[1]
+
+    out_h = h // factor_y
+    out_w = w // factor_x
+
+    if method == 'nearest':
+        result = agg.isel(
+            {y_dim: slice(None, None, factor_y),
+             x_dim: slice(None, None, factor_x)}
+        )
+    elif method == 'bilinear':
+        result = _preview_bilinear(agg, out_h, out_w, y_dim, x_dim)
+    else:
+        # method == 'mean'
+        if has_cuda_and_cupy() and is_cupy_array(agg.data):
+            # xarray coarsen has edge cases with cupy; fall back to nearest
+            result = agg.isel(
+                {y_dim: slice(None, None, factor_y),
+                 x_dim: slice(None, None, factor_x)}
+            )
+        else:
+            result = agg.coarsen(
+                {y_dim: factor_y, x_dim: factor_x}, boundary='trim'
+            ).mean()
+
+    result.name = name
+    return result
+
+
+def _preview_bilinear(agg, out_h, out_w, y_dim, x_dim):
+    """Apply bilinear interpolation, handling numpy/cupy/dask backends."""
+    import dask.array as da
+
+    if isinstance(agg.data, da.Array):
+        # For dask: use map_blocks with a wrapper that resizes each block,
+        # then concatenate. Simpler approach: compute target coords and
+        # use xarray interp (which handles dask natively).
+        y_coords = agg.coords[y_dim]
+        x_coords = agg.coords[x_dim]
+        new_y = np.linspace(
+            float(y_coords[0]), float(y_coords[-1]), out_h
+        )
+        new_x = np.linspace(
+            float(x_coords[0]), float(x_coords[-1]), out_w
+        )
+        result = agg.interp(
+            {y_dim: new_y, x_dim: new_x}, method='linear'
+        )
+    elif has_cuda_and_cupy() and is_cupy_array(agg.data):
+        out_data = _bilinear_cupy(agg.data, out_h, out_w)
+        y_coords = agg.coords[y_dim].values
+        x_coords = agg.coords[x_dim].values
+        new_y = np.linspace(y_coords[0], y_coords[-1], out_h)
+        new_x = np.linspace(x_coords[0], x_coords[-1], out_w)
+        result = xr.DataArray(
+            out_data,
+            dims=[y_dim, x_dim],
+            coords={y_dim: new_y, x_dim: new_x},
+            attrs=agg.attrs,
+        )
+    else:
+        out_data = _bilinear_numpy(agg.data, out_h, out_w)
+        y_coords = agg.coords[y_dim].values
+        x_coords = agg.coords[x_dim].values
+        new_y = np.linspace(y_coords[0], y_coords[-1], out_h)
+        new_x = np.linspace(x_coords[0], x_coords[-1], out_w)
+        result = xr.DataArray(
+            out_data,
+            dims=[y_dim, x_dim],
+            coords={y_dim: new_y, x_dim: new_x},
+            attrs=agg.attrs,
+        )
+    return result
diff --git a/xrspatial/tests/test_preview.py b/xrspatial/tests/test_preview.py