-
-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend data type support (for bfloat16
in particular)
#2656
Comments
I think this should be a priority for zarr v3, and I can't see any technical barriers for it. Would you have time / energy to work on a PR that would add this? Happy to give pointers for where to start. |
Yes! And thank you. 😄
Pointers would be very welcome! |
this is the function that we use for converting user input into a concrete dtype object: zarr-python/src/zarr/core/common.py Line 171 in f9c2024
bfloat16 , as well as the concrete bfloat16 dtype emerge on the other side of this function as concrete instances of bfloat16 .
because zarr v3 has its own datatype specification that's designed to be decoupled from numpy, we have a separate parsing step for creating a zarr v3 metadata document: zarr-python/src/zarr/core/metadata/v3.py Line 685 in f9c2024
DataType object defined there, you see that its basically an enum that contains some helper functions for mapping variants of that enum to strings that can be serialized to zarr metadata documents, and strings that can be used to make a numpy dtype. I think the v3 spec already reserved the name float16 ( you should check if bfloat16 is the same as the "IEEE 754 half-precision floating point: sign bit, 5 bits exponent, 10 bits mantissa" referenced by the spec -- if so, we can just use the float16 name for the metadata, otherwise we need to make a new name).
Once the metadata stuff is ironed out, you would need to check that |
This is great. I'll tag you when I have a first draft. Is there a benefit to trying to add support for v2 as well? |
I think v2 support would also be great, provided the result complies with the requirements for dtypes in the v2 spec |
@d-v-b I have opened a a draft PR to start a discussion on how Nothing is set in stone! I am very open to comments, suggestions in alternative directions etc. But I came to the conclusion that it's probably easiest to try and find a way to remove dependency on the Let me know how you would like to proceed! |
Problem
I would like to read/write
numpy
dtype extensions (such asbfloat16
) withzarr
version 2. I am usingml_dtypes
from JAX for the dtype extensions.I experience a similar issue when trying to read such dtype extensions.
The problem is related to the extensibility (or lack thereof) of the
kind
codes innumpy
. It is well described by the JAX team.Background
bfloat16
is a very important dtype in the AI/ML community. I would like to usezarr
(and specifically the Python implementation) to share models such as LLMs. However, the lack ofbfloat16
support is a major blocker.Questions
zarr
v2?zarr
v3 today?zarr
v3 in the future?Related issues
#711
cc @jhamman (as suggested by @TomNicholas)
The text was updated successfully, but these errors were encountered: