Add Dataset Attribute type for Pytrees #5732

brownj85 · 2024-05-23T18:39:08Z

Context:
PyTrees are an ideal format for dataset attributes, since they provide a compact representation of deeply nested structures, and Pennylane Operators can easily be converted to and from pytrees.

Shortcut story: https://app.shortcut.com/xanaduai/story/63174/datasets-serialization-using-pytrees

Description of the Change:

Adds a pytree.serialization module for converting PyTreeStructure objects to and from a JSON representation
Adds a DatasetPyTree data attribute type that can store any pytree-compatible type
DatasetPyTree is now the default attribute type for Operator
The pytree.unflatten method disables recording, to prevent deserialized operators from being queued

Benefits:

Datasets support any Pennylane object with a Pytree representation and serializable metadata
Complex serialization logic in DatasetOperator can be deprecated
Reduces file size of datasets

Possible Drawbacks:

Pytree datasets will not be compatible with versions < 0.37
Type information for metadata is not preserved - tuple, Wires, and Shots objects will all be converted
to list on a round-trip. __init__ methods are responsible for ensuring that metadata is converted to the correct type

Co-authored-by: Thomas R. Bromley <49409390+trbromley@users.noreply.github.com> Co-authored-by: Jack Brown <jack@xanadu.ai> Co-authored-by: Mudit Pandey <mudit.pandey@xanadu.ai>

Co-authored-by: Jack Brown <jack@xanadu.ai>

mudit2812

Thanks @brownj85 . Very excited to get this in!

pennylane/data/attributes/operator/operator.py

albi3ro

Any security concerns about deserializing json?

brownj85 · 2024-05-28T18:32:13Z

Any security concerns about deserializing json?

Not that I can think of - a very large JSON string could cause the interpreter to consume a lot of cpu and memory, but that's not a huge concern since we're not running this on a server.

DSGuala · 2024-06-10T15:44:01Z

pennylane/measurements/shots.py

-from typing import NamedTuple, Sequence, Tuple
+from typing import NamedTuple


Any chances of this causing issues with doing shot-based measurements? The tests seem to be passing, so maybe no problem?

It's hard to be 100% sure but it shouldn't cause a problem - Sequence is less restrictive than tuple so anything that worked before will still work

DSGuala · 2024-06-10T15:56:26Z

tests/data/attributes/operator/test_operator.py

+
+ d.op = qml.PauliX(0)
+
+ assert isinstance(d.attrs["op"], DatasetPyTree)


Why does this pass while d.op fails? 🤔

Accessing d.op returns the deserialized object, e.g PauliX in this case. attrs["op"] returns the underlying dataset attribute type that's used for de-serialization

pennylane/data/attributes/operator/operator.py

doc/releases/changelog-dev.md

pennylane/data/attributes/operator/operator.py

pennylane/data/attributes/pytree.py

obliviateandsurrender · 2024-06-18T19:34:18Z

pennylane/data/attributes/pytree.py

+ # but will fail if the leaves are not homogenous
+ DatasetArray(leaves, parent_and_key=(bind, "leaves"))
+ except (ValueError, TypeError):
+ DatasetList(leaves, parent_and_key=(bind, "leaves"))


Catching exceptions might be expensive for large data. Is there a way to rather rely on if conditions here?

I think it's better to delegate to numpy here - otherwise we'd need to implement a check that leaves is homogenous and array-compatible, which would likely just be duplicating numpy's logic. This would also be slower in the ideal case that leaves is homogenous.

I had the same concern about performance, but datasets are read a lot more than they're written, and DatasetArray is a lot more compact and performant than DatasetList. So the tradeoff makes sense IMO

obliviateandsurrender · 2024-06-18T20:06:32Z

pennylane/pytrees/pytrees.py


 if has_jax:
 _register_pytree_with_jax(pytree_type, flatten_fn, unflatten_fn)


+def is_pytree(type_: type[Any]) -> bool:


Do we need to check for some kind of internal structure here as well?

Based on the jax definition of a pytree I don't think so. Any container-like (list, dict) or registered object is a Pytree, even if they contain objects ("leaves") that aren't.

Co-authored-by: Utkarsh <utkarshazad98@gmail.com>

DSGuala · 2024-06-20T13:15:48Z

Looks good to me 👍
I don't think any of the pending questions/comments are blocking. We can merge for the release.

**Context:** PyTrees are an ideal format for dataset attributes, since they provide a compact representation of deeply nested structures, and Pennylane Operators can easily be converted to and from pytrees. Shortcut story: https://app.shortcut.com/xanaduai/story/63174/datasets-serialization-using-pytrees **Description of the Change:** - Adds a `pytree.serialization` module for converting `PyTreeStructure` objects to and from a JSON representation - Adds a `DatasetPyTree` data attribute type that can store any pytree-compatible type - `DatasetPyTree` is now the default attribute type for `Operator` - The `pytree.unflatten` method disables recording, to prevent deserialized operators from being queued **Benefits:** - Datasets support any Pennylane object with a Pytree representation and serializable metadata - Complex serialization logic in ``DatasetOperator`` can be deprecated - Reduces file size of datasets **Possible Drawbacks:** - Pytree datasets will not be compatible with versions < 0.37 - Type information for metadata is not preserved - `tuple`, `Wires`, and `Shots` objects will all be converted to `list` on a round-trip. `__init__` methods are responsible for ensuring that metadata is converted to the correct type --------- Co-authored-by: albi3ro <chrissie.c.l@gmail.com> Co-authored-by: Christina Lee <christina@xanadu.ai> Co-authored-by: Thomas R. Bromley <49409390+trbromley@users.noreply.github.com> Co-authored-by: Mudit Pandey <mudit.pandey@xanadu.ai> Co-authored-by: Paul Finlay <50180049+doctorperceptron@users.noreply.github.com> Co-authored-by: Diego <67476785+DSGuala@users.noreply.github.com> Co-authored-by: Utkarsh <utkarshazad98@gmail.com> Co-authored-by: Diego <diego_guala@hotmail.com>

albi3ro and others added 19 commits May 13, 2024 17:56

add tools for flattening and unflattening pytrees

253515f

Merge branch 'master' into pytree-flatten-unflatten

7180296

adding coverage

1a7fa19

Apply suggestions from code review

9858055

Co-authored-by: Thomas R. Bromley <49409390+trbromley@users.noreply.github.com> Co-authored-by: Jack Brown <jack@xanadu.ai> Co-authored-by: Mudit Pandey <mudit.pandey@xanadu.ai>

responding to feedback, leaf is PyTreeStructure with no type

a618d21

Merge branch 'master' into pytree-flatten-unflatten

dd5c7e4

pytree module

c7ca2c1

Update pennylane/pytrees.py

45141b3

Co-authored-by: Jack Brown <jack@xanadu.ai>

Update pennylane/pytrees.py

8086b3f

Co-authored-by: Jack Brown <jack@xanadu.ai>

Apply suggestions from code review

521e7fd

Co-authored-by: Jack Brown <jack@xanadu.ai>

change repr, add str

450de94

add serialization

73ec5ed

Merge branch 'pytree-flatten-unflatten' into datasets-pytrees

6c71b57

tests

0757766

Merge branch 'master' into pytree-flatten-unflatten

2882ce5

tests, docs

5a6d0e5

tests

dcf1bb9

Merge branch 'pytree-flatten-unflatten' into datasets-pytrees

4088fa6

update changelog

fafc2fe

Base automatically changed from pytree-flatten-unflatten to master May 24, 2024 14:28

brownj85 added 8 commits May 24, 2024 11:02

Merge branch 'master' into datasets-pytrees

703e3ab

refactor json handling

73d1626

tests

f5d209d

Merge branch 'master' into datasets-pytrees

e7af0b3

codefactor

5d63558

don't use | for union

c499706

remove dupe test file

215f39c

pylint

7f7d582

brownj85 changed the title ~~Datasets pytrees~~ Add Dataset Attribute type for Pytrees May 24, 2024

Merge branch 'master' into datasets-pytrees

0c13848

docstrings

11aa983

brownj85 requested a review from mudit2812 May 28, 2024 14:19

mudit2812 approved these changes May 28, 2024

View reviewed changes

albi3ro reviewed May 28, 2024

View reviewed changes

pennylane/data/attributes/operator/operator.py Show resolved Hide resolved

albi3ro reviewed May 28, 2024

View reviewed changes

Merge branch 'master' into datasets-pytrees

8bfc3f2

brownj85 requested a review from albi3ro May 30, 2024 14:36

brownj85 added 3 commits May 30, 2024 16:55

Merge branch 'master' into datasets-pytrees

001ada4

Merge branch 'master' into datasets-pytrees

671023c

Merge branch 'master' into datasets-pytrees

4803d12

DSGuala self-requested a review June 4, 2024 19:10

Merge branch 'master' into datasets-pytrees

136a331

DSGuala requested a review from obliviateandsurrender June 6, 2024 13:30

brownj85 and others added 3 commits June 6, 2024 14:46

Merge branch 'master' into datasets-pytrees

2e71c62

Merge branch 'master' into datasets-pytrees

b80543d

Merge branch 'master' into datasets-pytrees

45b6006

DSGuala reviewed Jun 17, 2024

View reviewed changes

DSGuala added this to the v0.37 milestone Jun 17, 2024

obliviateandsurrender reviewed Jun 18, 2024

View reviewed changes

Apply suggestions from code review

1333128

Co-authored-by: Utkarsh <utkarshazad98@gmail.com>

DSGuala approved these changes Jun 20, 2024

View reviewed changes

DSGuala and others added 3 commits June 20, 2024 14:05

Merge branch 'master' into datasets-pytrees

2904d1e

Merge branch 'master' into datasets-pytrees

e22b1dc

Merge branch 'master' into datasets-pytrees

f8ce69e

DSGuala enabled auto-merge (squash) June 21, 2024 13:28

DSGuala merged commit 21b40c5 into master Jun 21, 2024
40 checks passed

DSGuala deleted the datasets-pytrees branch June 21, 2024 14:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Dataset Attribute type for Pytrees #5732

Add Dataset Attribute type for Pytrees #5732

brownj85 commented May 23, 2024 •

edited

Loading

mudit2812 left a comment

albi3ro left a comment

brownj85 commented May 28, 2024

DSGuala Jun 10, 2024

brownj85 Aug 8, 2024

DSGuala Jun 10, 2024

brownj85 Aug 8, 2024

obliviateandsurrender Jun 18, 2024

brownj85 Aug 8, 2024

obliviateandsurrender Jun 18, 2024

brownj85 Aug 8, 2024

DSGuala commented Jun 20, 2024

		from typing import NamedTuple, Sequence, Tuple
		from typing import NamedTuple


		d.op = qml.PauliX(0)

		assert isinstance(d.attrs["op"], DatasetPyTree)

Add Dataset Attribute type for Pytrees #5732

Add Dataset Attribute type for Pytrees #5732

Conversation

brownj85 commented May 23, 2024 • edited Loading

mudit2812 left a comment

Choose a reason for hiding this comment

albi3ro left a comment

Choose a reason for hiding this comment

brownj85 commented May 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DSGuala commented Jun 20, 2024

brownj85 commented May 23, 2024 •

edited

Loading