Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tree arena #752

Merged
merged 16 commits into from
Dec 5, 2024
Merged

Tree arena #752

merged 16 commits into from
Dec 5, 2024

Conversation

KGrewal1
Copy link
Contributor

I've had a try at implementing the unsafe tree_arena in a separate lib (but in the same workspace for now) - haven't thought of a good way to make non root access not O(depth), without slowing down insertion or using more memory, though have improved accessing nodes from the root to O(1) from O(depth) and accessing direct children to O(1) from O(children)

@KGrewal1
Copy link
Contributor Author

KGrewal1 commented Nov 17, 2024

I am definitely doing something silly to break the needed preconditions, just need to work out what

Edit:
Think its just the reference to the Item being moved on insertion of new items, as its a reference to an element of the hashmap which can be invalidated

@PoignardAzur
Copy link
Contributor

PoignardAzur commented Nov 18, 2024

haven't thought of a good way to make non root access not O(depth), without slowing down insertion or using more memory,

The eventual plan will be to use a slotmap.

EDIT: Wait, no, I misunderstood the question.

@PoignardAzur
Copy link
Contributor

(Copying from zulip)

Your PR looks good, but we'd probably want more intermediary steps before relying on custom unsafe code; especially since the TreeArena API isn't quite settled yet.

Those steps would be:

  • Moving TreeArena to a separate crate without changing the code.
  • Adding a test suite.
  • Adding a command-line flag so we can switch between the safe implementation and the efficient implementation.

The test suite would help us test the unsafe version with MIRI, which we'd probably add to CI.

@KGrewal1 KGrewal1 force-pushed the tree_arena branch 3 times, most recently from 582557d to 4f7a05f Compare November 25, 2024 11:56
Copy link
Contributor

@PoignardAzur PoignardAzur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very high-quality contribution, and will be good to merge once the review points are addressed.


Something orthogonal to this review is the question of how the TreeArena API will evolve.

Having an unsafe implementation may slow us down if we want to try API changes and we have to make sure they're implemented in both versions.

I'd like to establish that the safe version will be a first-class citizen and the unsafe version a second-class citizen:

  • The safe version may have features / APIs that the unsafe version doesn't yet have.
  • If both versions are at feature parity, Masonry can switch on the unsafe version for best performance.
  • Otherwise, Masonry uses the safe version.

If that's what we go with, we should documented that pattern in a few places.

Comment on lines 3 to 4
## Architecture

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my view, the point of an ARCHITECTURE.md file is to give just enough context to understand everything about a codebase, not just the public API.

As such, I think this file should start with a description of the safe-and-unsafe implementations and why we include both.

Comment on lines 29 to 33
Of finding children: $O(1)$ - previously $O(\text{children})$

Of finding deeper descendants: $O(\text{depth})$ - ideally will be made $O(1)$

Access from the root: $O(1)$, previously $O(\text{depth})$ - improved as all nodes are known to be descended from the root
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather avoid Latex for this section. ARCHITECTURE.md is mostly meant to be read in a code editor, you can use plain text.

Also, I think the descriptions could be a little clearer.

Copy link
Member

@DJMcNab DJMcNab Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see an argument for using the GitHub Markdown MathJax extension. But for big-O notation, $$O(1)$$ or $$\mathcal{O}(1)$$ is not that much better than O(1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further thinking, given readme is so bare, does it make more sense to flesh this out more, but also move it into Readme rather than having a superfluous file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. The general workflow I like is "README for things you need to understand the project, ARCHITECTURE for things you need to understand the project's internals that you wouldn't get from the README or the doc root".

Comment on lines 13 to 14
It is possible to get shared (immutable) access or exclusive (mutable) access to the tree. These return `ArenaRef<'arena, T>` or `ArenaMut<'arena, T>` respectively

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this description could go into a little more detail about how the shared/exclusive access works, and why this crate is worth bothering with.

The general idea of TreeArena is to express a tree-like ownership graph, stored inside a flat data structure. The UnsafeCells are meant to bridge the gap between the two.

tree_arena/src/lib.rs Outdated Show resolved Hide resolved
tree_arena/src/tree_arena_safe.rs Outdated Show resolved Hide resolved
tree_arena/src/tree_arena_unsafe.rs Outdated Show resolved Hide resolved
tree_arena/src/tree_arena_unsafe.rs Outdated Show resolved Hide resolved
tree_arena/src/tree_arena_unsafe.rs Outdated Show resolved Hide resolved
tree_arena/tests/basic_tests.rs Outdated Show resolved Hide resolved
masonry/src/contexts.rs Outdated Show resolved Hide resolved
@KGrewal1
Copy link
Contributor Author

Regarding testing with miri in CI, the invocation I think is:

cargo +nightly miri t -p tree_arena --no-fail-fast --no-default-features  

but I'm unsure how best to access nightly rust in the workflow to run this

Comment on lines 172 to 177
/// # SAFETY
///
/// When using this on [`ArenaMutChildren`] associated with some node,
/// must ensure that `id` is a descendant of that node, otherwise can
/// obtain two mutable references to the same node
unsafe fn find_mut(&mut self, id: impl Into<NodeId>) -> Option<ArenaMut<'_, T>> {
Copy link

@Imberflur Imberflur Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I have a minor note on the safety documentation. This is sort of a comment out of nowhere, but I've developed an interest in reviewing unsafe code and I've been following the development in this repo.

Since this retains the full &mut self inside the returned ArenaMutChildren, I think the safety requirements are more extensive than the case described here. Any use of parent_arena in ArenaMutChildren needs to be careful to not invalidate the reference produced from unsafe { item.get().as_mut() } below. E.g. DataMap::find also must only be called on descendants, parent_arena.items.remove() must only operate on descendants, parent_arena.items.insert() must not overwrite existing nodes, parent_arena.items must be used to clear the whole structure, etc.

I was initially thinking the safety documentation could be reworked to note the overall requirement that the parent_arena field must not be used to invalidate the returned item reference and then include a few examples. However, I think all call sites would have the same safety note pointing to the details of ArenaMutChildren. So since all the details of ArenaMutChildren are private to this module, find_mut could be safe, and the unsafe block in it could be documented with the fact that all operations that ArenaMutChildren allows only access descendants and never invalidate the item reference created here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the overall point that the safety requirement should be better put as that you can only access children or remove children of the current node (I'm not sure whether insert() comment would be needed as it is a panic to insert a node with the same name as the key, and not sure clearing is different to removal as there isn't a clear api, and I assume any would be an action on the Arena itself thus checked by the borrow checker as having a reference to an item in the tree prevents any mutable access of the tree itself)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding safety, I think this makes sense, but can also see another argument that it being unsafe is correct in the case it did become public (though assume then the same should be true of the immutable method otherwise could obtain a immutable and mutable reference to the same node)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether insert() comment would be needed as it is a panic to insert a node with the same name as the key,

there isn't a clear api, and I assume any would be an action on the Arena itself thus checked by the borrow checker as having a reference to an item in the tree prevents any mutable access of the tree itself

These are comments about what the safe code in ArenaMutChildren is allowed to do with the internal items hashmap (which has things such as insert without such checks and clear).

Copy link
Contributor

@PoignardAzur PoignardAzur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending changing the find methods and removing the children lists.

You can consider everything else optional before merging.

tree_arena/src/tree_arena_unsafe.rs Outdated Show resolved Hide resolved
tree_arena/src/tree_arena_unsafe.rs Outdated Show resolved Hide resolved
tree_arena/src/tree_arena_unsafe.rs Outdated Show resolved Hide resolved
tree_arena/src/tree_arena_unsafe.rs Outdated Show resolved Hide resolved
tree_arena/src/tree_arena_unsafe.rs Show resolved Hide resolved
tree_arena/tests/basic_tests.rs Show resolved Hide resolved
@PoignardAzur
Copy link
Contributor

Another blocker before this is merged: I'd like for some version of my comment to show up in the documentation:

Having an unsafe implementation may slow us down if we want to try API changes and we have to make sure they're implemented in both versions.

I'd like to establish that the safe version will be a first-class citizen and the unsafe version a second-class citizen:

* The safe version may have features / APIs that the unsafe version doesn't yet have.

* If both versions are at feature parity, Masonry can switch on the unsafe version for best performance.

* Otherwise, Masonry uses the safe version.

If that's what we go with, we should documented that pattern in a few places.

@KGrewal1
Copy link
Contributor Author

Added to the readme and the doc comment at the root of the crate

Copy link
Contributor

@PoignardAzur PoignardAzur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I'd maybe like a comment noting that TreeNode::children needs to be removed but otherwise we're good.

@KGrewal1
Copy link
Contributor Author

KGrewal1 commented Nov 29, 2024

I'm not sure TreeNode::children can be removed yet as it is being used on node removal (as we remove all descended nodes) - I think the alternative would be iterating over the whole hashmap to find which children have the current node as their parent but that would definitely be inefficient?

@KGrewal1 KGrewal1 force-pushed the tree_arena branch 4 times, most recently from 78ccfcd to 6f503e9 Compare December 3, 2024 12:24
Copy link
Member

@DJMcNab DJMcNab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to approve the unsafe code, as I don't fully understand it.

However, given that we use the entirely safe version by default, it seems low-risk enough to land this. Not approving because of some of the docs concerns

tree_arena/Cargo.toml Show resolved Hide resolved
tree_arena/Cargo.toml Show resolved Hide resolved
// Copyright 2024 the Xilem Authors
// SPDX-License-Identifier: Apache-2.0

//! This crate implements a tree data structure for use in Masonry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to use cargo-rmde here; it's already set up in CI.

That doesn't need to block this PR, but it would be a good follow-up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity is that the content after the licence in the other lib.rs eg lines 4-22 in https://github.com/linebender/xilem/blob/main/xilem_core/src/lib.rs ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. This is confusing.

No, this is the setup used specifically in the Masonry crate. The version in Xilem Core is an old version of that

tree_arena/src/tree_arena_unsafe.rs Show resolved Hide resolved
#[derive(Debug)]
struct TreeNode<T> {
item: T,
children: Vec<NodeId>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a little bit surprising to me...

#[derive(Debug)]
struct DataMap<T> {
/// The items in the tree
items: HashMap<NodeId, Box<UnsafeCell<TreeNode<T>>>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we be able to have something like an UnsafeMutex type, ideally which is implemented safely by default but with an unsafe passthrough version, to check some that invariants are being met.
That is, have this same implementation, but in a checked manner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but would require quite a few changes as the TreeNode contains the contents themselves and the list of children, but the API returns these in separate structs - the item and the ArenaChildren (mut or ref) so I think both would need to be wrapped in some form of mutex for run time checking separately ?

@DJMcNab
Copy link
Member

DJMcNab commented Dec 5, 2024

I've done some spot checks of a few examples, and this all seems to work. As discussed on Zulip, we'll land this now, because the unsafe code is off by default.

@DJMcNab DJMcNab added this pull request to the merge queue Dec 5, 2024
Merged via the queue into linebender:main with commit a1d47a0 Dec 5, 2024
17 checks passed
@KGrewal1 KGrewal1 deleted the tree_arena branch December 5, 2024 10:53
github-merge-queue bot pushed a commit that referenced this pull request Dec 5, 2024
Use `cargo rdme` for crate readme and check in CI as mentioned in
#752 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants