pyfive.check_dtype(vlen=var.dtype) and h5py.check_dtype(vlen=var.dtype) return different. This fails in downstream xarray when using engine="h5netcdf" with pyfive backend.
In xarray check_dtype is used to check for vlen strings and decodes it from object to U. See below.
MCVE, pyfive/h5netcdf/xarray latest versions, h5py=13.5.1, hdf5=1.14.6
import h5py
import pyfive
import xarray as xr
import os
input_string = ["foó", "bár", "baź"]
original = xr.Dataset({"x": input_string})
kwargs = dict(encoding={"x": {"dtype": str}})
fname = "test.nc"
original.to_netcdf(fname, engine="h5netcdf", **kwargs)
print("----- PYFIVE --------------------")
with pyfive.File("test.nc") as fh:
var = fh["x"]
print(pyfive.check_dtype(vlen=var.dtype))
print(var.dtype.metadata)
print(fh["x"][...])
print("\n----- H5PY --------------------")
with h5py.File("test.nc") as fh:
var = fh["x"]
print(h5py.check_dtype(vlen=var.dtype))
print(var.dtype.metadata)
print(fh["x"][...])
backend = "h5py"
os.environ["H5NETCDF_READ_BACKEND"] = backend
print(f"\n----- xarray - h5netcdf - {backend} --------------------")
with xr.open_dataset("test.nc", engine="h5netcdf") as ds:
print(ds["x"])
backend = "pyfive"
os.environ["H5NETCDF_READ_BACKEND"] = backend
print(f"\n----- xarray - h5netcdf - {backend} --------------------")
with xr.open_dataset("test.nc", engine="h5netcdf") as ds:
print(ds["x"])
----- PYFIVE --------------------
string_info(encoding='ascii', length=None)
{'vlen': <class 'str'>}
['foó' 'bár' 'baź']
----- H5PY --------------------
<class 'str'>
{'vlen': <class 'str'>}
[b'fo\xc3\xb3' b'b\xc3\xa1r' b'ba\xc5\xba']
----- xarray - h5netcdf - h5py --------------------
<xarray.DataArray 'x' (x: 3)> Size: 36B
array(['foó', 'bár', 'baź'], dtype='<U3')
Coordinates:
* x (x) <U3 36B 'foó' 'bár' 'baź'
----- xarray - h5netcdf - pyfive --------------------
<xarray.DataArray 'x' (x: 3)> Size: 24B
array(['foó', 'bár', 'baź'], dtype=object)
Coordinates:
* x (x) object 24B 'foó' 'bár' 'baź'
pyfive.check_dtype(vlen=var.dtype)andh5py.check_dtype(vlen=var.dtype)return different. This fails in downstreamxarraywhen usingengine="h5netcdf"withpyfivebackend.In xarray check_dtype is used to check for vlen strings and decodes it from
objecttoU. See below.MCVE, pyfive/h5netcdf/xarray latest versions, h5py=13.5.1, hdf5=1.14.6