(fix): structured arrays for v2 #2681

ilan-gold · 2025-01-09T21:29:06Z

This is a best guess based on https://numpy.org/doc/2.1/reference/generated/numpy.dtype.kind.html and the fact that VLenBytes appears to be explicitly for strings.

This addresses the v2 case of #2134

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/tutorial.rst
Changes documented in docs/release.rst
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

ilan-gold · 2025-01-10T15:31:32Z

Hmm, this will need to handle the case where the array is not given the dtype.

ilan-gold · 2025-01-10T15:54:02Z

(P.S I used zarr.create(path=...) which didn't error but actually does nothing and uses a MemoryStore, which seems like a common error i.e., using path instead of store)

…ython into ig/structured_arrays_v2

src/zarr/core/buffer/core.py

ilan-gold · 2025-01-14T04:06:05Z

src/zarr/core/metadata/v2.py

+        # In the case of zarr v2, the simplest i.e., '|VXX' dtype is represented as a string
+        dtype_descr = self.dtype.descr
+        if self.dtype.kind == "V" and dtype_descr[0][0] != "" and len(dtype_descr) != 0:
+            dtype_json = tuple(self.dtype.descr)
+        else:
+            dtype_json = self.dtype.str


This is my attempt to match the old behavior. I didn't look back at the old code yet, but if someone knows this to be wrong, would be great to know.

Looks right to me

martindurant

I am keen to see this go in

src/zarr/core/buffer/core.py

martindurant · 2025-01-14T21:46:56Z

src/zarr/core/metadata/v2.py

@@ -220,6 +227,8 @@ def update_attributes(self, attributes: dict[str, JSON]) -> Self:


 def parse_dtype(data: npt.DTypeLike) -> np.dtype[Any]:
+    if isinstance(data, list):  # this is a valid _VoidDTypeLike check


Any iterable?

This is to handle the [(field_name, field_dtype, field_shape), ...] case on https://numpy.org/doc/2.1/reference/arrays.dtypes.html#specifying-and-constructing-data-types but at the same time to obey

This might require more stringent checking or tests...Not sure. The reason this tuple conversion happens is that lists (as data types) incoming from on-disk reads contain lists, not tuples. So maybe we should check list and data[0] is also list? And throw an error if it isn't? I'm not sure what else could be in the lists though

I guess the dtype constructor would make an exception (or our own comprehension fails) in the case the JSON on disk was edited - so I'm not too worried.

…ython into ig/structured_arrays_v2

martindurant · 2025-01-20T13:29:26Z

src/zarr/core/metadata/v2.py

@@ -220,6 +227,8 @@ def update_attributes(self, attributes: dict[str, JSON]) -> Self:


 def parse_dtype(data: npt.DTypeLike) -> np.dtype[Any]:
+    if isinstance(data, list):  # this is a valid _VoidDTypeLike check


I guess the dtype constructor would make an exception (or our own comprehension fails) in the case the JSON on disk was edited - so I'm not too worried.

martindurant · 2025-01-21T20:13:32Z

ping @d-v-b - I'd be happy to merge if you have no objections.

d-v-b · 2025-01-21T20:18:30Z

I think this looks good, but I'm admittedly not a structured dtype user, so I can't give it a very close examination. I think the only thing remaining is to use the new changelog system added by #2736

martindurant · 2025-01-21T20:38:51Z

I moved the changelog line to where it should be and will merge this when green, and leave a note on the towncrier PR thread saying that this will need dealing with (that PR is not yer merged).

d-v-b · 2025-01-21T20:39:58Z

great, thanks for working on this @ilan-gold and @martindurant

ilan-gold added 5 commits January 9, 2025 16:17

(fix): allow structured dtype in v2

d46de84

(fix): |V test

2090b98

(fix): lint

f3e2e2d

(fix): handle fill_value

a8d473b

(fix): put back structured array business

7898ef8

ilan-gold marked this pull request as ready for review January 10, 2025 15:13

Merge branch 'main' into ig/structured_arrays_v2

9120425

ilan-gold added 3 commits January 10, 2025 11:03

(fix): dtype encoding roundtrip

9c2efdd

Merge branch 'ig/structured_arrays_v2' of github.com:ilan-gold/zarr-p…

647868c

…ython into ig/structured_arrays_v2

Merge branch 'main' into ig/structured_arrays_v2

0b207a7

d-v-b reviewed Jan 10, 2025

View reviewed changes

src/zarr/core/buffer/core.py Show resolved Hide resolved

dstansby added the needs release notes Automatically applied to PRs which haven't added release notes label Jan 10, 2025

(fix): encode-decode test

b2071a7

github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Jan 13, 2025

ilan-gold added 3 commits January 13, 2025 15:37

Merge branch 'main' into ig/structured_arrays_v2

d19e968

(chore): rel notes

ff730a3

(fix): docstring test

89cc8d9

ilan-gold commented Jan 14, 2025

View reviewed changes

Merge branch 'main' into ig/structured_arrays_v2

9c59167

d-v-b mentioned this pull request Jan 14, 2025

Complex dtypes lost when writing V2 arrays #2711

Open

martindurant reviewed Jan 14, 2025

View reviewed changes

ilan-gold added 2 commits January 14, 2025 19:26

Merge branch 'main' into ig/structured_arrays_v2

7f64bab

Merge branch 'ig/structured_arrays_v2' of github.com:ilan-gold/zarr-p…

b67acb9

…ython into ig/structured_arrays_v2

e-koch mentioned this pull request Jan 17, 2025

Support for structured dtypes not yet supported in zarr 3 radio-astro-tools/spectral-cube#936

Open

ilan-gold requested review from d-v-b and martindurant January 20, 2025 12:10

Merge branch 'main' into ig/structured_arrays_v2

3e050e7

martindurant approved these changes Jan 20, 2025

View reviewed changes

dstansby mentioned this pull request Jan 20, 2025

Use towncrier for changelog generation #2736

Merged

Update release-notes.rst

7141de5

martindurant merged commit a260ae9 into zarr-developers:main Jan 21, 2025
30 checks passed

ilan-gold deleted the ig/structured_arrays_v2 branch January 22, 2025 12:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(fix): structured arrays for v2 #2681

(fix): structured arrays for v2 #2681

ilan-gold commented Jan 9, 2025 •

edited

Loading

ilan-gold commented Jan 10, 2025

ilan-gold commented Jan 10, 2025 •

edited

Loading

ilan-gold Jan 14, 2025

martindurant Jan 14, 2025

martindurant left a comment

martindurant Jan 14, 2025

ilan-gold Jan 15, 2025 •

edited

Loading

martindurant Jan 20, 2025

martindurant Jan 20, 2025

martindurant commented Jan 21, 2025

d-v-b commented Jan 21, 2025

martindurant commented Jan 21, 2025

d-v-b commented Jan 21, 2025

		@@ -220,6 +227,8 @@ def update_attributes(self, attributes: dict[str, JSON]) -> Self:


		def parse_dtype(data: npt.DTypeLike) -> np.dtype[Any]:
		if isinstance(data, list): # this is a valid _VoidDTypeLike check

(fix): structured arrays for v2 #2681

(fix): structured arrays for v2 #2681

Conversation

ilan-gold commented Jan 9, 2025 • edited Loading

ilan-gold commented Jan 10, 2025

ilan-gold commented Jan 10, 2025 • edited Loading

ilan-gold Jan 14, 2025

Choose a reason for hiding this comment

martindurant Jan 14, 2025

Choose a reason for hiding this comment

martindurant left a comment

Choose a reason for hiding this comment

martindurant Jan 14, 2025

Choose a reason for hiding this comment

ilan-gold Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

martindurant Jan 20, 2025

Choose a reason for hiding this comment

martindurant Jan 20, 2025

Choose a reason for hiding this comment

martindurant commented Jan 21, 2025

d-v-b commented Jan 21, 2025

martindurant commented Jan 21, 2025

d-v-b commented Jan 21, 2025

ilan-gold commented Jan 9, 2025 •

edited

Loading

ilan-gold commented Jan 10, 2025 •

edited

Loading

ilan-gold Jan 15, 2025 •

edited

Loading