Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream dependency replacement with walk all value nodes #3519

Merged
merged 21 commits into from
Jul 24, 2024

Conversation

scorbajio
Copy link
Contributor

@scorbajio scorbajio commented Jul 18, 2024

This change picks up where #3507 left off, but with a change of plan to use walkAllValueNodes to implement storage-dump functionality, and remove the stream dependency currently being used.

@ScottyPoi ScottyPoi force-pushed the stream-dependency-replacement-with-walkAllValueNodes branch from 400d040 to 29a3468 Compare July 18, 2024 20:36
Copy link

codecov bot commented Jul 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.37%. Comparing base (fb50628) to head (97bbf25).
Report is 3 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

Flag Coverage Δ
client ?
tx 78.49% <ø> (+0.05%) ⬆️
vm 58.50% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

@acolytec3
Copy link
Contributor

Checked the bundle size with this change and it looks like it reduces the trie bundle by about 30% with removing the readable-stream dependency.

acolytec3
acolytec3 previously approved these changes Jul 19, 2024
Copy link
Contributor

@acolytec3 acolytec3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work here.

@acolytec3 acolytec3 force-pushed the stream-dependency-replacement-with-walkAllValueNodes branch from d4d18eb to fad5592 Compare July 19, 2024 00:42
acolytec3
acolytec3 previously approved these changes Jul 19, 2024
Copy link
Member

@holgerd77 holgerd77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I would at least like to discuss, otherwise this looks great! 🤩

}
}
})
.then((_) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are exposing a lot from Trie to the StateManager, also with all these imports above. Can we abstract this away one level more by adding an async method getStorageMap(): StorageMap to the Trie class and call that instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reiterate here: I would really like to update here!

I am still not sure e.g. if it is (was) a good idea that we instantiate a new object for every trie node walk throuh (and should rather go bare metal and use the plain Uint8Array or something for performance reasons), and if we so deep-use the trie internals it would make it harder to encapsule respectively to refactor.

I also do think that regarding verkle statefull integration it will be beneficial if we stay 1:1 with as many methods as possible and this one would also be a candidate for a 1:1 replacement if we add this higher level method.

(let me know if there are downsides I might not see atm)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% following your thoughts here. Is your basic concern that we're calling walkAllValueNodes here and have to use TrieNode and other related types/helpers from trie inside dumpStorageRange? I'm not seeing where we instantiate a heavy new trie object (unless you're talking about further down in the process of walking the trie - so maybe lookupNode?) If so, that feels like something we should handle as a separate PR. Otherwise, yes, seems like we could certainly encapsulate the logic here in getStorageMap as you described and add a range filter as a parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your basic concern that we're calling walkAllValueNodes here and have to use TrieNode and other related types/helpers from trie inside dumpStorageRange

Yes, that's my main concern.

We make heavy use of API parts from Trie reaching deep into Trie and which might not have been a good idea to have been exposed in the first place, and if we officially use these ourselves we are stuck with these structures until the next breaking release round.

So I would strongly prefer here to use/introduce a higher level and more abstracted API call into trie (and, again: which then might be the same for Verkle) which does not make so much use of Trie (half-)internals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, to re-iterate on the last part: a higher level getStorageMap() method already lays the ground for more shared state manager code by stateful verkle and MPT, while this specifc Trie code drives us further apart from that. So I think that is the better abstraction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks, yes, we doesn't need to decide/commit to verkle at this point

Copy link
Contributor Author

@scorbajio scorbajio Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea would be having a single getStorageMap function that allows you to pass in a range and get a storageMap in return, if no range is passed in default to the full range, which is to say a full dump of storage?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I guess so, do not have this fully on the plate, but basically a method in the Trie class which can then cover all the use cases here, and each optimally with a one-line-code call into trie.

You can also comment if this makes sense or if I am overlooking something. If you think this is feasible it would be great if you can update on the PR!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea would be having a single getStorageMap function that allows you to pass in a range and get a storageMap in return, if no range is passed in default to the full range, which is to say a full dump of storage?

Yes, this is exactly it. Just copy your code from dumpStorageRange and put it in its own function on the trie class with optional start and ends for the range..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can someone please actually DO it? 🙂😆 This is not more that an hours work and it would be really good if we get this one merged.

@holgerd77
Copy link
Member

Yeah, actually this looks really, really great! 🎉

So this is for the VM.

Before:

grafik

After:

grafik

Note that both the readable-stream code and the Buffer polyfill are gone!!!

Copy link
Contributor

@acolytec3 acolytec3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@acolytec3 acolytec3 merged commit 6a68531 into master Jul 24, 2024
34 checks passed
@acolytec3 acolytec3 deleted the stream-dependency-replacement-with-walkAllValueNodes branch July 24, 2024 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants