-
Notifications
You must be signed in to change notification settings - Fork 94
Open
Description
grib_tree function fails to properly handle ECMWF ensemble data
Issue Description
The grib_tree function in kerchunk.grib2 doesn't properly handle ECMWF ensemble forecast data, specifically:
- It fails to recognize and preserve ensemble member information
- It significantly reduces the number of groups in the output compared to input
- It doesn't provide a way to access ensemble dimension in the resulting zarr structure
Reproduction
When processing ECMWF ensemble data with 19 variables and 51 ensemble members (969 total message groups):
from kerchunk.grib2 import scan_grib, grib_tree
import datatree
date_str='20240229'
ecmwf_s3url=f"s3://ecmwf-forecasts/{date_str}/00z/ifs/0p25/enfo/{date_str}000000-0h-enfo-ef.grib2"
esc_groups = scan_grib(ecmwf_s3url)
original_tree = grib_tree(esc_groups)
gfs_dt = datatree.open_datatree(
fsspec.filesystem("reference", fo=original_tree).get_mapper(""),
engine="zarr",
consolidated=False
)
# The key test: can we access ensemble members?
print(gfs_dt.keys()) # Check for variablesThe resulting structure loses ensemble information, making it impossible to distinguish between different ensemble members in the output.
This gist explains the situation and a wayforward to have the ensemble number in the grib_tree.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels