A Python library and driver script for preparing CESM and NorESM native model output for submission to CMIP7 via CMOR (Climate Model Output Rewriter).
cmip7-prep automates the pipeline from raw model timeseries to CMOR-compliant NetCDF:
- Variable mapping — Reads a YAML mapping file (
cesm_to_cmip7.yamlornoresm_to_cmip7.yaml) that describes how native model variables (e.g.,TREFHT) map to CMIP names (e.g.,tas), including unit conversions and multi-variable formulas. - File discovery — Selects only the timeseries files needed for the requested CMIP variables.
- Realization — Evaluates the mapping (direct rename, scaling, or formula) to produce CMIP DataArrays.
- Vertical interpolation — Optionally interpolates hybrid-sigma level variables to standard CMIP pressure grids (e.g.,
plev19,plev39) using geocat-comp. - Regridding — Regrids from native spectral element (SE) or tripolar ocean grids to 1° lat/lon using precomputed ESMF weight files via xESMF.
- CMORization — Writes CMOR-compliant output with correct metadata, bounds, and fill values using the CMOR library.
| Model | Atmosphere grid | Ocean grid |
|---|---|---|
| CESM | ne30pg3 (SE) | tx2_3v2 (tripolar) |
| NorESM | ne30pg3 / ne16pg3 (SE) | — |
A conda environment with the required dependencies:
conda create -n cmip7-prep python=3.13 \
xarray numpy dask xesmf cmor cftime pyyaml geocat-comp
conda activate cmip7-preppip install -e .Derecho (CESM):
module load conda
conda activate /glade/work/jedwards/conda-envs/CMORDEV
pip install -e .NIRD (NorESM):
conda activate /projects/NS9560K/diagnostics/cmordev_env/
pip install -e .Make sure you have generated timeseries files for the run before starting.
General usage via cmor_driver.py:
# Atmosphere variables
python scripts/cmor_driver.py --realm atmos --tsdir /path/to/timeseries/
# Land variables
python scripts/cmor_driver.py --realm land --tsdir /path/to/timeseries/Derecho:
qcmd -- python scripts/cmor_driver.py --realm atmos --tsdir /path/to/timeseries/The mapping YAML files live in data/. Each entry describes how a native model variable maps to a CMIP variable.
Keys use the form <cmip_name>_<frequency>-<level>-<grid>-<realm>:
# Simple source mapping with unit scaling
pr_tavg-u-hxy-u:
table: atmos
units: kg m-2 s-1
sources:
- model_var: PRECT
scale: 1000.0 # m/s -> kg m-2 s-1
# Formula combining multiple variables
clt_tavg-u-hxy-u:
table: atmos
units: "%"
formula: CLDTOT * 100
sources:
- model_var: CLDTOT
# Pressure-level variable
ta_tavg-p19-hxy-air:
table: atmos
units: K
dims: [time, plev, lat, lon]
levels:
name: plev19
units: Pa
sources:
- model_var: T
# Hybrid-sigma level variable
cl_tavg-al-hxy-u:
table: atmos
units: "%"
formula: CLOUD * 100
dims: [time, lev, lat, lon]
levels:
name: standard_hybrid_sigma
src_axis_name: lev
sources:
- model_var: CLOUDBoth CESM (cesm_to_cmip7.yaml) and NorESM (noresm_to_cmip7.yaml) mappings are included.
The CESM variable mapping is maintained in a Google Spreadsheet and stored in version control as data/cesm_to_cmip7.yaml.
Spreadsheet: https://docs.google.com/spreadsheets/d/1BJV6CLgCTUpuaUlEQoFc-7ATBCsoJezYkyCvU1NTlxw/edit?usp=sharing
Columns A–F and L–S are populated from CMIP7 table metadata. Columns G–K describe how each CMIP variable is generated from CESM model output and are the ones to fill in:
| Col | Column | Description |
|---|---|---|
| G | CESM Variable Name |
The CESM variable(s) needed as input, comma-separated, e.g. PRECC, PRECL |
| H | Formula |
Expression used to compute the CMIP variable from the CESM inputs, e.g. (PRECC + PRECL) * 1000.0 — leave blank when the input variable needs only a rename or scaling |
| I | Scale |
Multiplicative scale factor applied to each input variable, comma-separated and positionally aligned with column G, e.g. 1000.0 |
| J | Freq |
Sampling frequency of each input variable, e.g. day for daily fields |
| K | Alias |
Rename each input variable before use, comma-separated and positionally aligned with column G |
- Open the spreadsheet and fill in columns G–K for any variables that are missing a
CESM Variable Name. - Export as CSV: File → Download → Comma Separated Values (.csv)
- Save the downloaded file as
data/cesm_data.csv. - Regenerate the YAML:
python scripts/convert_csv_to_yaml.py --model cesm \ --input data/cesm_data.csv \ --output data/cesm_to_cmip7.yaml
| Module | Purpose |
|---|---|
cmip7_prep.mapping_compat |
Load and evaluate YAML mapping files; Mapping, VarConfig |
cmip7_prep.pipeline |
File discovery, dataset opening, vertical transform dispatch |
cmip7_prep.regrid |
Regrid to 1° lat/lon for CESM/NorESM or 2° lat/lon for NorESM via precomputed ESMF weight files |
cmip7_prep.vertical |
Hybrid-sigma → pressure-level interpolation (geocat-comp) |
cmip7_prep.cmor_writer |
Write CMOR-compliant output (CmorSession) |
cmip7_prep.cmor_utils |
Fill values, time encoding, bounds, monotonicity utilities |
cmip7_prep.cache_tools |
Regridder and FX field caching (RegridderCache, FXCache) |
cmip7_prep.mom6_static |
Read MOM6 static grid for ocean FX fields |
pytestDoctests in all source modules are run automatically via --doctest-modules (configured in pytest.ini).
The data/ directory contains:
| File | Description |
|---|---|
cesm_to_cmip7.yaml |
CESM → CMIP7 variable mapping |
noresm_to_cmip7.yaml |
NorESM → CMIP7 variable mapping |
cmor_dataset.json |
Default CMOR dataset attributes |
piControl.json |
CMOR experiment metadata for piControl |
depth_bnds.nc |
Soil level depth bounds for sdepth axis |
ocean_geometry.nc |
MOM6 ocean grid geometry |