Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reworked custom measurements #198

Closed
wants to merge 469 commits into from
Closed

Conversation

twop
Copy link

@twop twop commented Jul 10, 2022

Objective

Remove MeasureFunc and make custom measurements faster and mode intuitive

Fixed #57

Left:

  • add code docs and update changelog
  • make sure that names, implementation etc are aligned and agreed upon

Context

I decided to use the approach of storing custom data per measured element and providing an implementation of new Measure trait that has a method measure(&self, data: &StoredData, node: Node, constraint:Size<Option<f32>>) -> Size<f32>

Note that StoredData (just T in code) is going to be owned by Taffy and kept in SparseSecondaryMap. Usually it will be an id of some sort Measure trait implementation can understand.

For example in my project I need to provide measurements for text. Thus Measure trait will be implemented by Font struct and StoredData will be probably a tuple of (TextHandle, FontSize)

Note that StoredData can be an enum in case there are more than 1 type of objects that need to be measured

Feedback wanted

  • Names

    • I'm not a native speaker and not sure that Measure.measure makes sense from lang point of view, would be awesome if somebody more competent would say "totally" or "no, here is a better name"
    • I'm not happy with SImpleTaffy as an alias for Taffy<NoMeasure>. That might be important for demos and examples where majority of the cases do not involve custom measurements (thus, just Taffy might be better), on the other hand, in all more or less practical use cases there is a need for custom measurements (especially when dealing with text). Hence, it is possible that Taffy is a better alias for "unmeasured" case and something like MeasuredTaffy for "measured" case.
  • Architecture

    • I don't see a problem with the current approach when it comes to multithreading, but maybe I'm missing something critical. StoredData is going to be owned by Taffy and it is up to the caller to ensure threadsafety for Measure trait implementation if compute_measured_layout needs to be called from a different thread, in other words, multithreading is possible and it is completely outside of the library's API surface area.
    • I think with that approach there is no need for alloc feature, unless I'm missing something it was only needed for Boxed variant of MeasureFunc

@jkelleyrtp
Copy link
Member

Bevy would use Entity here, which can be stored as a u64; although we may need to make some changes downstream to make sure that the coercion required is public enough.

I think we're not on the same page here.
In this paradigm Taffy holds your id attached to Taffy's NodeData struct.

You are right! I misunderstood you example, my apologies.

I think I would be personally rate this approach the last (although higher than MeasureFunc ^_^ ). For my project it won't give me a benefit over storing Node -> MyData myself. Partially because I want to be able to store more that 8 bytes and partially I don't want to do decoding/encoding my metadata into u64. Again, in my particular project it is not an id but a collection of small properties (id is one of them though).

How are your properties stored? It sounds like you don't have any Nodes - or at least they're not accessible from a key?

@twop
Copy link
Author

twop commented Jul 10, 2022

How are your properties stored? It sounds like you don't have any Nodes - or at least they're not accessible from a key?

I have the vdom/reconciler loosely inspired by https://github.com/fitzgen/dodrio. Thus I detect atomic changes in attributes and treat the last vdom as a source of truth for some properties(not all though). So I know exactly when they change, thus when rendering the tree I can skip traversing purely layout properties and focus only on rendering. I hope that makes sense

@twop
Copy link
Author

twop commented Jul 10, 2022

I'm sorry if I'm being too pedantic here, happy to change the code they way you think is best.

@twop
Copy link
Author

twop commented Jul 10, 2022

How are your properties stored? It sounds like you don't have any Nodes - or at least they're not accessible from a key?

Oh, maybe one more relevant data point for the conversation: I don't use taffy as a source of truth for my UI hierarchy. I have my own stateful UI tree that has both Rect and Node fields, also some UI nodes don't have their own layout (such as lines between elements). I traverse the UI tree 60 times a second but relayout happens relatively infrequently. Thus, when I compute layout I store the outcome(rects) in my data structure that is optimized for traversal in Rect field (fewer CPU cache misses).

@jkelleyrtp
Copy link
Member

jkelleyrtp commented Jul 10, 2022

How are your properties stored? It sounds like you don't have any Nodes - or at least they're not accessible from a key?

I have the vdom/reconciler loosely inspired by https://github.com/fitzgen/dodrio. Thus I detect atomic changes in attributes and treat the last vdom as a source of truth for some properties(not all though). So I know exactly when they change, thus when rendering the tree I can skip traversing purely layout properties and focus only on rendering. I hope that makes sense

Well that sounds a lot like the internals of Dioxus!

https://github.com/DioxusLabs/dioxus/tree/master/packages/core

Internally we use a slab that maps ID to raw pointer (which admittedly takes some unsafe). You might eventually do the same since it's useful to keep track of nodes across reconciliation.

But, I do understand the problem - your IDs aren't stable between VDom cycles since it sounds like you're using references directly, and old references become invalid after reconciliation.

I would imagine a stable ID system being more performant than Boxing measurefuncs.

For this type of usecase a "visitor" taffy would make the most sense. IE taffy doesn't store any data and defers to your implementation instead.

I think ultimately we would prefer to have a visitor approach than to make the Taffy instance generic over an external type.

Would it be reasonable to provide the SparseSecondaryMap on create/remove methods rather than built into Taffy directly?

This way, generics could be moved to methods and Taffy would optionally fill in the sparse map for you?

Edit:

Now that I think about it, what's stopping you from maintaining the SecondarySparseMap yourself?

When you add or remove nodes from taffy, update your secondary map, and then pass the secondarymap into the measure callback? It really sounds like the SecondaryMap is doing all the bookkeeping.

@twop
Copy link
Author

twop commented Jul 10, 2022

Now that I think about it, what's stopping you from maintaining the SecondarySparseMap yourself?
When you add or remove nodes from taffy, update your secondary map, and then pass the secondarymap into the measure callback? It really sounds like the SecondaryMap is doing all the bookkeeping.

You are absolutely right. I tried to outline my thinking process about pros/cons of this approach in the comment above (#198 (comment))

It seems you both perceive the cost of having Taffy to be generic is significantly higher than potential benefits. Let me try to iterate having

// note no "T" or any stored data in the signature
pub trait Measure {
    fn measure(&self, node: Node, constraint: Size<Option<f32>>) -> Size<f32>;
}

And see where it leads us.

@alice-i-cecile
Copy link
Collaborator

Sounds good, I'm looking forward to seeing the results of the experiment!

@twop
Copy link
Author

twop commented Jul 10, 2022

Here is a draft how it might look like. I will outline a couple of moments in code inline comments

}

impl Taffy {
/// Updates the stored layout of the provided `node` and its children with no measurements
pub fn compute_layout(&mut self, node: Node, size: Size<Option<f32>>) -> Result<(), TaffyError> {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method might be misleading because it can be called regardless whether or not Taffy has any nodes that require measurements. In other words, the type doesn't capture this requirements any more, thus maybe require Measure trait implementation for all cases?

/// Stores a boxed function
#[cfg(any(feature = "std", feature = "alloc"))]
Boxed(Box<dyn Measurable>),
pub struct NoMeasure {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to go if we decided this approach is better

}
}

struct NoopMeasure();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one can be probably made public

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely agree. Although I'd probably lean towards IdentityMeasure?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally

taffy.set_measure(node, Some(MeasureFunc::Raw(|_| Size { width: 100.0, height: 100.0 }))).unwrap();
taffy.compute_layout(node, Size::undefined()).unwrap();
measure.size = 100.;
taffy.mark_dirty(node).unwrap();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mark_dirty outlines another issue, that data needed for measurements now completely decoupled from the API, thus it might be easy to forget to call mark_dirty on the node,

Copy link
Collaborator

@Weibye Weibye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This direction seems better than the previous which used generics :)

How is the measure data stored? I'm not sure I properly understand how or if it is stored at all. Does it not need to be stored?

My primary concern atm is the has_measure field. It does not seem to be tied to any kind of data, so we would have to manually keep it in sync. Is there any way we can remove the has_measure by checking if the node stores any kind of measure data?

@twop
Copy link
Author

twop commented Jul 12, 2022

How is the measure data stored? I'm not sure I properly understand how or if it is stored at all. Does it not need to be stored?

With this approach measure data is managed completely outside of the library. E.g. a user has to store the mapping Map<Node, MeasureData> somewhere outside of the library.

The generic approach allowed to store data inside the library, hence the generic T that represents the type of the stored data.

My primary concern atm is the has_measure field. It does not seem to be tied to any kind of data, so we would have to manually keep it in sync. Is there any way we can remove the has_measure by checking if the node stores any kind of measure data?

In the current iteration the data is managed outside of the library, thus there is a need to mark the node that some measurements need to be provided (has_measure flag). On the other hand, if we do decided to store measure data inside the library, then there are only few nodes that require measurement. Let's assume that we decided to store u64 and represent it as Option<u64> then 1000 nodes will have at least 8Kb of memory overhead storing mostly None, that's why I decided to keep the measure data as a separate map.

Hope that context helps

@Weibye
Copy link
Collaborator

Weibye commented Jul 13, 2022

Hope that context helps

That helps a lot! Now it makes sense 😄

@geom3trik
Copy link
Collaborator

Incidentally this is exactly the approach we use in morphorm for what I believe is the same thing (we call it content_size). It's just a method on a Node trait which queries for the content size. https://github.com/vizia/morphorm/blob/c14e888b1d66719fc3f13ccdb1ea01a10368ef9e/src/node.rs#L69

@alice-i-cecile
Copy link
Collaborator

I strongly prefer the content_size name; I'd like to align on that (and put in a doc alias for MeasureFunc and the JS equivalent).

@nicoburns
Copy link
Collaborator

I would like to raise the issue of caching here. I imagine a lot of measure_funcs will want to cache not just the overall result of the computation, but intermediate computations too (for example, a text layout engine may want to cache word-breaking or text shaping data). Does this single function approach still work if the called function will internally need to mutate? Are we just expecting the measure_func to do it's own internal synchronisation here if necessary?

@twop
Copy link
Author

twop commented Aug 23, 2022

I would like to raise the issue of caching here. I imagine a lot of measure_funcs will want to cache not just the overall result of the computation, but intermediate computations too (for example, a text layout engine may want to cache word-breaking or text shaping data). Does this single function approach still work if the called function will internally need to mutate? Are we just expecting the measure_func to do it's own internal synchronisation here if necessary?

That is a very good point, my sketch does allow it conceptually but the type signature will needed to be slightly tweaked

// note adding "mut" keyword
pub trait Measure {
    fn measure(&mut self, node: Node, constraint: Size<Option<f32>>) -> Size<f32>;
}

Then the caching can be done inside the implementation of Measure trait

@twop
Copy link
Author

twop commented Jan 9, 2023

Hi all, I just took a look at the latest sources, and wanted to surface some thoughts/questions.

  1. I like the Tree trait idea, seems like it can be useful to "embed" layout engine into a tree structure that most UI frameworks already have.
  2. Taffy as a default implementation of it totally makes sense
  3. I wonder what and if should be done with MeasureFunc given the new architectural advancements. On the one hand now a potential perf loss is more scoped and isolated (I can make my own Tree implementation), on the other hand, most usages (I assume) would just use default Taffy implementation.
  4. So I wonder if the library should have a first class support for a different approach of custom measuring. It seems that the default implementation is not that much code to copy, but the api for manipulating children is robust, thus "copy pasting" it to make my own Tree implementation kinda feels off to me.

If there is an appetite for "4" (first class support of different measurement api for custom nodes), I'm happy to contribute, not exactly sure how the api might look like but happy to sketch out ideas if you think that there is value in that effort.

@alice-i-cecile
Copy link
Collaborator

I'd be happy to see what those ideas look like :) The existing custom measurement stuff has always felt unclear and poorly designed to me, so I'd be very keen to see if you can come up with something better.

@nicoburns
Copy link
Collaborator

@twop Hi! Great to have you back around the place :)

most usages (I assume) would just use default Taffy implementation

I've actually been assuming the opposite. Partly because I suspect most consumers of Taffy will be UI frameworks who as you say will likely already have their own storage setup. You can't really do much with just Taffy by itself? The exception would consumers who are using Taffy over FFI which probably won't support the full custom layout interface very well (except perhaps C++ FFI which we could likely expose it to).

now a potential perf loss is more scoped and isolated

Is there any reason to expect a performance loss here?

So I wonder if the library should have a first class support for a different approach of custom measuring.

Some thoughts:

  1. I really like the idea expressed in (MeasureFunc is confusing and hard to use #57 (comment)) which is to have measure_func be passed into the compute_layout function rather than stored on a node. In C++ (where the idea of measure_func's originated) you could happily store a reference to external state in a measure_func. But in Rust that doesn't really work due to borrow checker rules. And passing it in layout time gets around this as we can simply borrow the external state for the duration of the layout.
  2. I wonder if it would make sense to allow users of Taffy to store their own id on a node (perhaps as a u64). That would remove the need for them to maintain an internal mapping of Taffy node ids to their own nodes, and potentially cut out a hashmap lookup or similar for each measurement?
  3. Leading on from that, perhaps it would make to make the Taffy struct generic over the shape of the data stored for custom nodes (allow users to specify there own type).

I'm less sure about 3, but 1 and 2 seem like clear improvements to me.

It seems that the default implementation is not that much code to copy, but the api for manipulating children is robust, thus "copy pasting" it to make my own Tree implementation kinda feels off to me.

I agree. I wonder if it would be possible for us to modularise our storage backend and provide some kind of "lego building blocks for building your own storage". At some level though, I think this might be unnavoidable, and the best solution might be to create and maintain well documented "white box" example source code that is explicitly intended to be copy/pasted and customised by users of Taffy.

@nicoburns
Copy link
Collaborator

Closing as completed by #490 which ended up implementing a very similar solution to the one proposed here.

@nicoburns nicoburns closed this Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code quality Make the code cleaner or prettier. controversial This work requires a heightened standard of review due to implementation or design complexity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MeasureFunc is confusing and hard to use