Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream dependency replacement with walk all value nodes #3519

Merged
merged 21 commits into from
Jul 24, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

98 changes: 59 additions & 39 deletions packages/statemanager/src/stateManager.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { Chain, Common } from '@ethereumjs/common'
import { RLP } from '@ethereumjs/rlp'
import { Trie, createTrieFromProof, verifyTrieProof } from '@ethereumjs/trie'
import { LeafNode, Trie, createTrieFromProof, verifyTrieProof } from '@ethereumjs/trie'
import {
Account,
Address,
Expand Down Expand Up @@ -39,6 +39,7 @@ import type {
StorageDump,
StorageRange,
} from '@ethereumjs/common'
import type { Nibbles, TrieNode } from '@ethereumjs/trie'
import type { DB, PrefixedHexString } from '@ethereumjs/util'
import type { Debugger } from 'debug'

Expand Down Expand Up @@ -984,26 +985,32 @@ export class DefaultStateManager implements EVMStateManagerInterface {
* Both are represented as hex strings without the `0x` prefix.
*/
async dumpStorage(address: Address): Promise<StorageDump> {
function nibblestoBytes(arr: Nibbles): Uint8Array {
const buf = new Uint8Array(arr.length / 2)
for (let i = 0; i < buf.length; i++) {
let q = i * 2
buf[i] = (arr[q] << 4) + arr[++q]
}
return buf
}

await this.flush()
const account = await this.getAccount(address)
if (!account) {
throw new Error(`dumpStorage f() can only be called for an existing account`)
}
const trie = this._getStorageTrie(address, account)

return new Promise((resolve, reject) => {
return new Promise((resolve, _) => {
const storage: StorageDump = {}
const stream = trie.createReadStream()

stream.on('data', (val: any) => {
storage[bytesToHex(val.key)] = bytesToHex(val.value)
})
stream.on('end', () => {
resolve(storage)
})
stream.on('error', (e) => {
reject(e)
})
return trie
.walkAllValueNodes(async (node: TrieNode, _) => {
if (node instanceof LeafNode) {
storage[bytesToHex(nibblestoBytes(node._nibbles))] = bytesToHex(node._value)
}
})
.then((_) => resolve(storage))
})
}

Expand All @@ -1017,6 +1024,15 @@ export class DefaultStateManager implements EVMStateManagerInterface {
The object will also contain `nextKey`, the next (hashed) storage key after the range included in `storage`.
*/
async dumpStorageRange(address: Address, startKey: bigint, limit: number): Promise<StorageRange> {
function nibblestoBytes(arr: Nibbles): Uint8Array {
acolytec3 marked this conversation as resolved.
Show resolved Hide resolved
const buf = new Uint8Array(arr.length / 2)
for (let i = 0; i < buf.length; i++) {
let q = i * 2
buf[i] = (arr[q] << 4) + arr[++q]
}
return buf
}

if (!Number.isSafeInteger(limit) || limit < 0) {
throw new Error(`Limit is not a proper uint.`)
}
Expand All @@ -1026,44 +1042,48 @@ export class DefaultStateManager implements EVMStateManagerInterface {
if (!account) {
throw new Error(`Account does not exist.`)
}

const trie = this._getStorageTrie(address, account)

return new Promise((resolve, reject) => {
return new Promise((resolve, _) => {
let inRange = false
let i = 0

/** Object conforming to {@link StorageRange.storage}. */
const storageMap: StorageRange['storage'] = {}
const stream = trie.createReadStream()

stream.on('data', (val: any) => {
if (!inRange) {
// Check if the key is already in the correct range.
if (bytesToBigInt(val.key) >= startKey) {
inRange = true
} else {
return
}
}

if (i < limit) {
storageMap[bytesToHex(val.key)] = { key: null, value: bytesToHex(val.value) }
i++
} else if (i === limit) {
return trie
.walkAllValueNodes(async (node: TrieNode, currentKey) => {
if (node instanceof LeafNode) {
// storage[bytesToHex(nibblestoBytes(node._nibbles))] = bytesToHex(node._value)

const keyBytes = nibblestoBytes(currentKey.concat(node.key()))
if (!inRange) {
// Check if the key is already in the correct range.
if (bytesToBigInt(keyBytes) >= startKey) {
inRange = true
} else {
return
}
}

if (i < limit) {
storageMap[bytesToHex(keyBytes)] = { key: null, value: bytesToHex(node._value) }
i++
} else if (i === limit) {
resolve({
storage: storageMap,
nextKey: bytesToHex(keyBytes),
})
}
}
})
.then((_) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are exposing a lot from Trie to the StateManager, also with all these imports above. Can we abstract this away one level more by adding an async method getStorageMap(): StorageMap to the Trie class and call that instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reiterate here: I would really like to update here!

I am still not sure e.g. if it is (was) a good idea that we instantiate a new object for every trie node walk throuh (and should rather go bare metal and use the plain Uint8Array or something for performance reasons), and if we so deep-use the trie internals it would make it harder to encapsule respectively to refactor.

I also do think that regarding verkle statefull integration it will be beneficial if we stay 1:1 with as many methods as possible and this one would also be a candidate for a 1:1 replacement if we add this higher level method.

(let me know if there are downsides I might not see atm)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% following your thoughts here. Is your basic concern that we're calling walkAllValueNodes here and have to use TrieNode and other related types/helpers from trie inside dumpStorageRange? I'm not seeing where we instantiate a heavy new trie object (unless you're talking about further down in the process of walking the trie - so maybe lookupNode?) If so, that feels like something we should handle as a separate PR. Otherwise, yes, seems like we could certainly encapsulate the logic here in getStorageMap as you described and add a range filter as a parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your basic concern that we're calling walkAllValueNodes here and have to use TrieNode and other related types/helpers from trie inside dumpStorageRange

Yes, that's my main concern.

We make heavy use of API parts from Trie reaching deep into Trie and which might not have been a good idea to have been exposed in the first place, and if we officially use these ourselves we are stuck with these structures until the next breaking release round.

So I would strongly prefer here to use/introduce a higher level and more abstracted API call into trie (and, again: which then might be the same for Verkle) which does not make so much use of Trie (half-)internals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, to re-iterate on the last part: a higher level getStorageMap() method already lays the ground for more shared state manager code by stateful verkle and MPT, while this specifc Trie code drives us further apart from that. So I think that is the better abstraction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks, yes, we doesn't need to decide/commit to verkle at this point

Copy link
Contributor Author

@scorbajio scorbajio Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea would be having a single getStorageMap function that allows you to pass in a range and get a storageMap in return, if no range is passed in default to the full range, which is to say a full dump of storage?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I guess so, do not have this fully on the plate, but basically a method in the Trie class which can then cover all the use cases here, and each optimally with a one-line-code call into trie.

You can also comment if this makes sense or if I am overlooking something. If you think this is feasible it would be great if you can update on the PR!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea would be having a single getStorageMap function that allows you to pass in a range and get a storageMap in return, if no range is passed in default to the full range, which is to say a full dump of storage?

Yes, this is exactly it. Just copy your code from dumpStorageRange and put it in its own function on the trie class with optional start and ends for the range..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can someone please actually DO it? 🙂😆 This is not more that an hours work and it would be really good if we get this one merged.

resolve({
storage: storageMap,
nextKey: bytesToHex(val.key),
nextKey: null,
})
}
})

stream.on('end', () => {
resolve({
storage: storageMap,
nextKey: null,
})
})
stream.on('error', (e) => reject(e))
)
})
}

Expand Down
3 changes: 1 addition & 2 deletions packages/trie/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,7 @@
"@types/readable-stream": "^2.3.13",
"debug": "^4.3.4",
"lru-cache": "10.1.0",
"ethereum-cryptography": "^2.2.1",
"readable-stream": "^3.6.0"
"ethereum-cryptography": "^2.2.1"
},
"devDependencies": {
"@ethereumjs/genesis": "^0.2.2",
Expand Down
9 changes: 0 additions & 9 deletions packages/trie/src/trie.ts
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ import { verifyRangeProof } from './proof/range.js'
import { ROOT_DB_KEY } from './types.js'
import { _walkTrie } from './util/asyncWalk.js'
import { bytesToNibbles, matchingNibbleLength } from './util/nibbles.js'
import { TrieReadStream as ReadStream } from './util/readStream.js'
import { WalkController } from './util/walkController.js'

import type {
Expand Down Expand Up @@ -1081,14 +1080,6 @@ export class Trie {
return true
}

/**
* The `data` event is given an `Object` that has two properties; the `key` and the `value`. Both should be Uint8Arrays.
* @return Returns a [stream](https://nodejs.org/dist/latest-v12.x/docs/api/stream.html#stream_class_stream_readable) of the contents of the `trie`
*/
createReadStream(): ReadStream {
return new ReadStream(this)
}

/**
* Returns a copy of the underlying trie.
*
Expand Down
1 change: 0 additions & 1 deletion packages/trie/src/util/index.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
export * from './encoding.js'
export * from './genesisState.js'
export * from './readStream.js'
export * from './tasks.js'
export * from './walkController.js'
66 changes: 0 additions & 66 deletions packages/trie/src/util/readStream.ts

This file was deleted.

Loading