gridmet_raster2polygon

Raster to polygon aggregations of gridMET meteorological data. The spatial aggregation are performed for GridMET from grid/raster (NetCDF) to polygons (shp).

GridMET

gridMET combines high-resolution spatial climate data (e.g. temperature, precipitation, humidity) from PRISM with daily temporal attributes and additional meteorological variables from the regional reanalysis dataset NLDAS-2. The resulting product is a dataset of daily meteorological variables at ~4km x 4km resolution across the contiguous U.S.

Codebook

Dataset Columns:

GEOID {string}: Geographic ID of aggregation level (U.S. County, ZCTA, or grid cell).
year {int}: Dataset year.
sph {float64}: Specific humidity (kg/kg), representing the mass of water vapor per unit mass of air.
vpd {float64}: Vapor pressure deficit (hPa), which measures the difference between the amount of moisture in the air and how much moisture the air can hold when saturated.
tmmn {float64}: Minimum daily temperature (Kelvin).
tmmx {float64}: Maximum daily temperature (Kelvin).
pr {float64}: Precipitation (mm), total daily precipitation.
rmin {float64}: Minimum relative humidity (%), the lowest daily relative humidity recorded.
rmax {float64}: Maximum relative humidity (%), the highest daily relative humidity recorded.
srad {float64}: Downward shortwave solar radiation (W/m²), measuring the solar energy received at the surface.
vs {float64}: Wind speed at 10 meters (m/s), representing the average daily wind speed at 10 meters above ground level.
th {float64}: Wind direction at 10 meters (degrees from north), indicating the direction from which the wind is blowing.

Pipeline Overview

Data Processing Workflow

The pipeline transforms gridMET raster data (NetCDF format) into aggregated polygon-level statistics through several stages:

1. Download (`download_gridmet.py`)

Downloads raw gridMET NetCDF files from the gridMET repository
One file per variable per year
Output: data/{geo_name}/input/raw/{var}_{year}.nc

2. Aggregate (`aggregate_gridmet.py`)

Performs zonal statistics to aggregate raster grid cells to polygon boundaries (counties, ZCTAs, or custom shapefiles)
Uses weighted averages based on the overlap between grid cells and polygons
Processes each variable and year independently
Output: data/{geo_name}/intermediate/{var}_{year}_{polygon_name}.parquet

3. Format (`format_gridmet.py`)

Joins all meteorological variables into a single daily dataset
Ensures data consistency and removes null values
Creates a unified time series with all variables for each geographic unit
Output: data/{geo_name}/output/daily/meteorology__gridmet__{polygon_name}_daily__{year}.parquet

4. Yearly Aggregates (`get_yearly.py`)

Calculates annual average for each meteorological variable
Groups by geographic unit (county/ZCTA/grid cell)
Output: data/{geo_name}/output/yearly/meteorology__gridmet__{polygon_name}_yearly__{year}.parquet
Columns: {polygon_name}, year, and average values for each gridMET variable

5. Seasonal Aggregates (`seasonal_vars.py`)

Calculates seasonal averages based on configurable season definitions (see conf/seasons.yaml)
Default seasons:
- Summer: June, July, August
- Winter: December, January, February (all from the same calendar year)
- Additional seasons can be configured in conf/seasons.yaml
Each season's variables are suffixed with the season name (e.g., tmmx_summer, pr_winter)
Output: data/{geo_name}/output/seasonal/meteorology__gridmet__{polygon_name}_seasonal__{year}.parquet
Columns: {polygon_name}, year, and seasonal averages (e.g., tmmx_summer, tmmn_winter, etc.)

Output Files

All outputs are stored in Parquet format for efficient storage and fast querying:

Daily Data

Path: data/{geo_name}/output/daily/meteorology__gridmet__{polygon_name}_daily__{year}.parquet
Granularity: Daily values for each geographic unit
Columns: Geographic ID, date, and all 11 gridMET variables

Yearly Data

Path: data/{geo_name}/output/yearly/meteorology__gridmet__{polygon_name}_yearly__{year}.parquet
Granularity: Annual averages for each geographic unit
Columns: Geographic ID, year, and mean values for all 11 gridMET variables

Seasonal Data

Path: data/{geo_name}/output/seasonal/meteorology__gridmet__{polygon_name}_seasonal__{year}.parquet
Granularity: Seasonal averages for each geographic unit
Columns: Geographic ID, year, and season-specific mean values (e.g., tmmx_summer, pr_winter)

Customizing Seasons

Seasonal definitions are configured in conf/seasons.yaml. Each season specifies:

months: List of month numbers (1=January, 12=December)

Example configuration:

summer:
  months: [6, 7, 8]  # June, July, August

winter:
  months: [12, 1, 2]  # December, January, February

Run

Conda environment

Clone the repository and create a conda environment.

git clone <https://github.com/<user>/repo>
cd <repo>

conda env create -f requirements.yml
conda activate <env_name> #environment name as found in requirements.yml

It is also possible to use mamba.

mamba env create -f requirements.yml
mamba activate <env_name>

Input and output paths

Determine the configuration file to be used in cfg.datapaths. The input, intermediate, and output arguments are used in utils/create_dir_paths.py to fix the paths or directories from which a step in the pipeline reads/writes its input/output data inside the corresponding /data subfolders.

If cfg.datapaths points to <input_path> or <output_path>, then utils/create_dir_paths.py will automatically create a symlink as in the following example:

export HOME_DIR=$(pwd)

cd $HOME_DIR/data/input/ .
ln -s <input_path> . 

cd $HOME_DIR/data/output/
ln -s <output_path> .

Using custom shapefiles

It is also possible to run this script to aggregate based on your own custom shapefile. In order to do this, follow the following steps:

Create a conf/datapaths/{shapefile_name}.yaml that contains the locations of input, intermediate, and output files. An example is given with county_cannon.yaml.
Create a conf/shapefiles/{shapefiles_name}.yaml with important metadata for your shapefile. The following metadata is required:
- years: Available shapefile years (list)
- idvar: ID column name
- shapefile_prefix: Base naming format
Modify the datapaths and shapefile entries in conf/config.yaml to match these new config files. For example:

    defaults:
    - _self_
    - datapaths: grid4x4_cannon
    - gridmet
    - shapefiles: grid_4x4km

NB: this pipeline expects shapefiles to be stored in paths of the form {shapefile_prefix}_{shapefile_year}/{shapefile_prefix}_{shapefile_year}.shp

Pipeline

You can run the snakemake pipeline described in the Snakefile.

run snakemake pipeline or run the pipeline:

export PYTHONPATH="."
snakemake --cores 4

Dockerized Pipeline

Create the folder where you would like to store the output dataset.

mkdir <path>

Pull and Run:

docker pull nsaph/gridmet_raster2polygon
docker run -v <path>:/app/data/ nsaph/gridmet_raster2polygon

If you want to build your own image use from the Dockerfile int the GitHub repository.

docker build -t <image_name> .

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
conf		conf
data		data
notes		notes
src		src
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
requirements.yaml		requirements.yaml
snakefile.sbatch		snakefile.sbatch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gridmet_raster2polygon

GridMET

Codebook

Dataset Columns:

Pipeline Overview

Data Processing Workflow

1. Download (`download_gridmet.py`)

2. Aggregate (`aggregate_gridmet.py`)

3. Format (`format_gridmet.py`)

4. Yearly Aggregates (`get_yearly.py`)

5. Seasonal Aggregates (`seasonal_vars.py`)

Output Files

Daily Data

Yearly Data

Seasonal Data

Customizing Seasons

Run

Conda environment

Input and output paths

Using custom shapefiles

Pipeline

Dockerized Pipeline

Pull and Run:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

gridmet_raster2polygon

GridMET

Codebook

Dataset Columns:

Pipeline Overview

Data Processing Workflow

1. Download (download_gridmet.py)

2. Aggregate (aggregate_gridmet.py)

3. Format (format_gridmet.py)

4. Yearly Aggregates (get_yearly.py)

5. Seasonal Aggregates (seasonal_vars.py)

Output Files

Daily Data

Yearly Data

Seasonal Data

Customizing Seasons

Run

Conda environment

Input and output paths

Using custom shapefiles

Pipeline

Dockerized Pipeline

Pull and Run:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Download (`download_gridmet.py`)

2. Aggregate (`aggregate_gridmet.py`)

3. Format (`format_gridmet.py`)

4. Yearly Aggregates (`get_yearly.py`)

5. Seasonal Aggregates (`seasonal_vars.py`)

Packages