-
Notifications
You must be signed in to change notification settings - Fork 585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Dataset Attribute type for Pytrees #5732
Conversation
Co-authored-by: Thomas R. Bromley <49409390+trbromley@users.noreply.github.com> Co-authored-by: Jack Brown <jack@xanadu.ai> Co-authored-by: Mudit Pandey <mudit.pandey@xanadu.ai>
Co-authored-by: Jack Brown <jack@xanadu.ai>
Co-authored-by: Jack Brown <jack@xanadu.ai>
Co-authored-by: Jack Brown <jack@xanadu.ai>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @brownj85 . Very excited to get this in!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any security concerns about deserializing json?
Not that I can think of - a very large JSON string could cause the interpreter to consume a lot of cpu and memory, but that's not a huge concern since we're not running this on a server. |
from typing import NamedTuple, Sequence, Tuple | ||
from typing import NamedTuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chances of this causing issues with doing shot-based measurements? The tests seem to be passing, so maybe no problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to be 100% sure but it shouldn't cause a problem - Sequence
is less restrictive than tuple
so anything that worked before will still work
|
||
d.op = qml.PauliX(0) | ||
|
||
assert isinstance(d.attrs["op"], DatasetPyTree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this pass while d.op
fails? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing d.op
returns the deserialized object, e.g PauliX
in this case. attrs["op"]
returns the underlying dataset attribute type that's used for de-serialization
# but will fail if the leaves are not homogenous | ||
DatasetArray(leaves, parent_and_key=(bind, "leaves")) | ||
except (ValueError, TypeError): | ||
DatasetList(leaves, parent_and_key=(bind, "leaves")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching exceptions
might be expensive for large data. Is there a way to rather rely on if
conditions here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to delegate to numpy here - otherwise we'd need to implement a check that leaves
is homogenous and array-compatible, which would likely just be duplicating numpy's logic. This would also be slower in the ideal case that leaves
is homogenous.
I had the same concern about performance, but datasets are read a lot more than they're written, and DatasetArray
is a lot more compact and performant than DatasetList
. So the tradeoff makes sense IMO
|
||
if has_jax: | ||
_register_pytree_with_jax(pytree_type, flatten_fn, unflatten_fn) | ||
|
||
|
||
def is_pytree(type_: type[Any]) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check for some kind of internal structure here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the jax definition of a pytree I don't think so. Any container-like (list
, dict
) or registered object is a Pytree, even if they contain objects ("leaves") that aren't.
Co-authored-by: Utkarsh <utkarshazad98@gmail.com>
Looks good to me 👍 |
**Context:** PyTrees are an ideal format for dataset attributes, since they provide a compact representation of deeply nested structures, and Pennylane Operators can easily be converted to and from pytrees. Shortcut story: https://app.shortcut.com/xanaduai/story/63174/datasets-serialization-using-pytrees **Description of the Change:** - Adds a `pytree.serialization` module for converting `PyTreeStructure` objects to and from a JSON representation - Adds a `DatasetPyTree` data attribute type that can store any pytree-compatible type - `DatasetPyTree` is now the default attribute type for `Operator` - The `pytree.unflatten` method disables recording, to prevent deserialized operators from being queued **Benefits:** - Datasets support any Pennylane object with a Pytree representation and serializable metadata - Complex serialization logic in ``DatasetOperator`` can be deprecated - Reduces file size of datasets **Possible Drawbacks:** - Pytree datasets will not be compatible with versions < 0.37 - Type information for metadata is not preserved - `tuple`, `Wires`, and `Shots` objects will all be converted to `list` on a round-trip. `__init__` methods are responsible for ensuring that metadata is converted to the correct type --------- Co-authored-by: albi3ro <chrissie.c.l@gmail.com> Co-authored-by: Christina Lee <christina@xanadu.ai> Co-authored-by: Thomas R. Bromley <49409390+trbromley@users.noreply.github.com> Co-authored-by: Mudit Pandey <mudit.pandey@xanadu.ai> Co-authored-by: Paul Finlay <50180049+doctorperceptron@users.noreply.github.com> Co-authored-by: Diego <67476785+DSGuala@users.noreply.github.com> Co-authored-by: Utkarsh <utkarshazad98@gmail.com> Co-authored-by: Diego <diego_guala@hotmail.com>
Context:
PyTrees are an ideal format for dataset attributes, since they provide a compact representation of deeply nested structures, and Pennylane Operators can easily be converted to and from pytrees.
Shortcut story: https://app.shortcut.com/xanaduai/story/63174/datasets-serialization-using-pytrees
Description of the Change:
pytree.serialization
module for convertingPyTreeStructure
objects to and from a JSON representationDatasetPyTree
data attribute type that can store any pytree-compatible typeDatasetPyTree
is now the default attribute type forOperator
pytree.unflatten
method disables recording, to prevent deserialized operators from being queuedBenefits:
DatasetOperator
can be deprecatedPossible Drawbacks:
tuple
,Wires
, andShots
objects will all be convertedto
list
on a round-trip.__init__
methods are responsible for ensuring that metadata is converted to the correct type