Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix slow flexbox layout when auto sized #504

Closed

Conversation

matthewgapp
Copy link

@matthewgapp matthewgapp commented Jun 26, 2023

Objective

This PR is mostly for discussion of a better caching solution that should address #502.

This PR introduces a call frame cache that lives only for the duration of a given layout pass. It's implemented purely internally (in a thread local) to avoid breaking changes to the public APIs.

It has switches to conditionally engage the cache depending on if the flexbox layout algorithm is recursive. But I'm sure these "on" switches are just a rough approximation of where it should be determined to use this cache.

I originally developed this against the 0.3.11 sha (0bccf6c) (old, fast branch) and saw a 5 - 10x speedup for flex auto use cases. After rebasing onto main, I no longer see a considerable speedup (only about 10% speedup) and I'm sure the additional work of checking whether to cache and caching slows down other non-flex-auto paths. It appears that compute_node_layout is now called with a more diverse range of values than before, causing the cache to miss and bloat over the course of a layout.

It feels like there should be a way to performantly cache the results in a robust way. After testing, changes to styling like flex direction and margins have surprising impacts on performance and I think that caching should "just work" regardless of the current combination of style settings.

Context

Discuss any context that may be needed for reviewers to understand the changes you've made.
This may include related issues, previous discussion, or links to documentation or code.

Feedback wanted

I hope this PR inspires an implementation that works.

added flex variants and cleaned up organization of tests

added prototype caching

cleaned up increment and decrement functions

using ordered float to hash but this is 3 - 4x slower than converting to u32

add hashbrown, a faster hashmap

added some escape hatches to caching

more cleanups

refactored stack frame cache stuff into its own module
@matthewgapp matthewgapp changed the title Matt/fix/slow flexbox layout Fix slow flexbox layout when auto sized Jun 26, 2023
@nicoburns
Copy link
Collaborator

@matthewgapp Have you seen https://github.com/DioxusLabs/taffy/blob/main/src/tree/cache.rs ? I believe this implements a cache similar to the one in this PR, except:

  • Without the need for thread locals because it stores the cache on the node itself in the tree (next to it's styles, etc)
  • With a more principled approach to caching that attempts to match the calling patterns that the algorithms actually use ( see the comment starting "Caching Strategy" in the link above).

My feeling is that it ought not to be possible to greatly improve on the that (unless it contains a bug, which is entirely possible), but it would be interesting to bisect which commit breaks your perf improvements. Better perf would obviously be great if we can get it to work without sacrificing correctness.

We could also potentially add an opt-in mode (perhaps specified as a non-standard style) that disables some features of the algorithm(s) in return for better perf. This would likely involve tweaking the algorithms to measure their children less (and/or in more cache friendly ways) rather than the tweaking the caching.

@alice-i-cecile alice-i-cecile added the performance Layout go brr label Jul 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Layout go brr
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants