Skip to content

grib_tree function to properly handle ECMWF ensemble data #546

@nishadhka

Description

@nishadhka

grib_tree function fails to properly handle ECMWF ensemble data

Issue Description

The grib_tree function in kerchunk.grib2 doesn't properly handle ECMWF ensemble forecast data, specifically:

  1. It fails to recognize and preserve ensemble member information
  2. It significantly reduces the number of groups in the output compared to input
  3. It doesn't provide a way to access ensemble dimension in the resulting zarr structure

Reproduction

When processing ECMWF ensemble data with 19 variables and 51 ensemble members (969 total message groups):

from kerchunk.grib2 import scan_grib, grib_tree
import datatree
date_str='20240229'
ecmwf_s3url=f"s3://ecmwf-forecasts/{date_str}/00z/ifs/0p25/enfo/{date_str}000000-0h-enfo-ef.grib2"
esc_groups = scan_grib(ecmwf_s3url)
original_tree = grib_tree(esc_groups)
gfs_dt = datatree.open_datatree(
    fsspec.filesystem("reference", fo=original_tree).get_mapper(""), 
    engine="zarr", 
    consolidated=False
)

# The key test: can we access ensemble members?
print(gfs_dt.keys())  # Check for variables

The resulting structure loses ensemble information, making it impossible to distinguish between different ensemble members in the output.

This gist explains the situation and a wayforward to have the ensemble number in the grib_tree.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions