Using hidefix to determine byte ranges in HDF files? #38

TomNicholas · 2024-04-10T19:55:05Z

I'm building VirtualiZarr, an evolution of kerchunk, that allows you to determine byte ranges of chunks in netCDF files, but then concatenate the virtual representation of those chunks using xarray's API.

This works by creating a ChunkManifest object in-memory (one per netCDF Variable per file initially), then defining ways to merge those manifests.

What I'm wondering is if hidefix's Index class could be useful to me as a way to generate the ChunkManifest for a netCDF file without using kerchunk/fsspec (see this issue). In other words I use hidefix only to determine the byte ranges, not for actually reading the data. (I plan to actually read the bytes later using the rust object-store crate, see zarr-developers/zarr-python#1661).

Q's:

Is this idea dumb?
Does hidefix.Index contain the byte range information I'm assuming it does?
Can hidefix read over S3?
Would I be better off just using h5py directly?

cc @norlandrhagen

xref pydata/xarray#7446

The text was updated successfully, but these errors were encountered:

gauteh · 2024-04-10T20:19:53Z

Hi,

That is definitely possible to do. hidefix has multiple implementations of readers, some cached, some direct (the fastest one, so no need to use the cached one), some async. My plan was to just add another reader based on S3 (#26). That would allow you to use the slicing code in hidefix and get some help from the traits. But you can also use the slicing code directly with only using the Index type, as you describe. It will give you chunks and the byte slices in those (and the byte slices inside the decompressed chunk).

Using the faster HDF5 code for building the index makes a big difference on indexing speed, otherwise it might be nice to serialize the index. Index in hidefix implements serde traits.

Regards, Gaute

TomNicholas · 2024-04-10T20:45:57Z

Thanks for the quick reply! Okay that's exciting. I'll have to try it out and see if I can get just the byte ranges from the Index class.

TomNicholas mentioned this issue Apr 10, 2024

Using cog3pio to determine byte ranges in COG files? weiji14/cog3pio#16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using hidefix to determine byte ranges in HDF files? #38

Using hidefix to determine byte ranges in HDF files? #38

TomNicholas commented Apr 10, 2024 •

edited

Loading

gauteh commented Apr 10, 2024

TomNicholas commented Apr 10, 2024

Using hidefix to determine byte ranges in HDF files? #38

Using hidefix to determine byte ranges in HDF files? #38

Comments

TomNicholas commented Apr 10, 2024 • edited Loading

gauteh commented Apr 10, 2024

TomNicholas commented Apr 10, 2024

TomNicholas commented Apr 10, 2024 •

edited

Loading