Update state manager documentation (#707)

* Draft * Update project-words * Edit accumulator description * Add diagrams * adding mermaid-specific size containers * oops, left the <center> tag on one of the diagrams, fixed * lint fix * Addressing comments * Correct typo * Addressing comments * Address typo --------- Co-authored-by: sammysango <136692631+sammysango@users.noreply.github.com>
Consensys · Sep 17, 2024 · d6f348d · d6f348d
1 parent 02644d1
commit d6f348d
Show file tree

Hide file tree

Showing 4 changed files with 2,961 additions and 540 deletions.
diff --git a/docs/architecture/stack/evm-state-manager/index.mdx b/docs/architecture/stack/evm-state-manager/index.mdx
@@ -1,31 +1,253 @@
 ---
-title: EVM State Manager
+title: EVM state manager
+description: How state management works on Linea
 sidebar_position: 4
 image: /img/socialCards/evm-state-manager.jpg
 ---
 
-## EVM State Manager
+The state manager is the part of the execution client responsible for updating the state of the 
+network globally, and the state of every account individually. The state manager also audits the
+"read" access made in the EVM, meaning it monitors, verifies, and logs all operations where the
+EVM needs to read data from the blockchain state.
 
-### What is it?
+:::info
 
-The part of the execution client responsible for updating the state of the network globally, and the state of every account individually.
+"State" refers to the data stored on the blockchain at any given point in time. To
+update state is to update the record of the contents of every account whose contents have 
+changed. 
 
-### What does it do?
+:::
 
-Receives blocks that have been executed by the sequencer, and uses the trace data from their execution to update the state of the network. It then passes this updated network state information to the prover, for subsequent submission to Ethereum
+The main task of the state manager is to receive blocks that have been executed by the [sequencer](../sequencer/index.mdx) 
+and use the trace data from their execution to update the state of the network. Linea uses two 
+data structure types to manage state: 
+1. A Merkle-Patricia Trie to record the world state, maintain consensus, and process blocks. This 
+mirrors how consensus and state are [managed on Ethereum Mainnet](https://ethereum.org/en/developers/docs/data-structures-and-encoding/patricia-merkle-trie/). 
+2. A variant of regular Merkle trees called a sparse Merkle tree (SMT), which is used to more 
+efficiently track, manage, and update storage slots representing accounts. 
 
-### How does it do it?
+It then passes this updated network state information to the [prover](../trace-expansion-proving/index.mdx) 
+in the form of Merkle proofs for submission to Ethereum Mainnet (L1). 
 
-Through the Promethean power of cryptography and the careful stewardship of Merkle trees 😝
+Below, we'll explain the element of Linea's state management in greater detail, focussing on the 
+SMT configuration that sets Linea apart.
 
-The state of every account in the zkEVM is represented by a hash: a unique, encrypted and brief identifier. Because of the way hashes work, any change in the state of an account will result in a changed, but still unique and encrypted, hash.
+## Merkle trees
 
-The relationship between accounts–which accounts control which ones, for example–is represented by a tree structure. The way a small twig is derived from a larger branch, and that from a trunk: in this way, the entire bifurcating, iterative history of the network state is retained.
+The Merkle tree and its variations are commonly used across EVM chains to store and retrieve 
+data about the state of every account on the blockchain. 
 
-linea-besu uses a particular version of this technology called a Sparse Merkle Tree: it uses default values to represent certain levels of branching in the tree–and if there has been no change to that default value, it means there has been no activity “further out the branch”, and therefore no need to store data regarding it. This allows the network to be much more efficient, at the level of data storage and other improvements based on that, than other implementations of Merkle trees.
+A Merkle tree is comprised of 'nodes' that branch off from each other. At the base is the 'root', 
+or state root, from which branches stem, and leaves stem from the branches. 
 
-The state manager in linea-besu is relatively simple, and has two main functions: updating the state of the network, and proof generation.
+<div class="img-large">
+ <div class="mermaid-medium">
+ ```mermaid
+ flowchart TD
+ A[root] --> B[node] & C[node]
+ B --> D[node] & E[node]
+ C --> F[node] & G[node]
+ ```
+ </div>
+</div>
 
-- The sequencer executes a block, and sends it not only to the trace generator for it to do its job, but also to the state manager. Upon receiving an executed block from the sequencer, the state manager updates the state, in the Merkle tree, of every account that was affected, as documented in the trace data.
-- The new values, represented by new hashes, are now the state of the network following that block.
-- This new network state is necessary to generate a proof for submission to Ethereum. Therefore, the state manager passes the information off to the prover.
+Each node, regardless of type, is represented by a cryptographic hash which encodes data about its 
+properties — for example, the contents of your account. Each hash encodes the hashes of its child 
+nodes. Taken to its full extent, this cascading system means the root encodes data of the state of 
+every single account on the blockchain. 
+
+Cryptographic hashes are deterministic, which means you can reverse the hash function to get the 
+data which it encoded. If you have the hash of the root—the only node without a parent—you can 
+theoretically derive from it the data of any node in the entire tree. 
+
+As a layer 2 (L2) network, Linea is in the business of making transacting faster and more efficient.
+Linea implements a sparse Merkle tree to track account state and generate and store proofs, and 
+unlock greater efficiency when compared to standard Merkle trees, which require recomputation for 
+every block, leading to excessive computational demands.
+
+### Sparse Merkle trees
+
+Linea's state management uses sparse Merkle trees to minimize computation and contribute to the 
+blockchain's efficiency. 
+
+A sparse Merkle tree is a variation of a standard Merkle tree where not all leaf nodes are filled 
+with data; instead, data is only stored in nodes where it's needed. It is a complete tree of fixed 
+depth, meaning that all branches of the tree have the same length—i.e. the same number of leaves. 
+
+At initialization—the beginning of the chain's history—all leaf nodes are set to a default value, 
+which is typically a hash of a specific value, such as zero. Because all leaf nodes have the same 
+hash value, the parent nodes and higher-level nodes also have the same hash value. A node whose 
+hash is the default value for its level is therefore considered to represent an _empty_ subtree. 
+
+<div class="center-container">
+ <div class="mermaid-medium">
+ ```mermaid
+ flowchart TD
+ A[root] --> B["`**node A**`"
+ null] & C["`**node B**`"
+ contains data]
+ B --> D["`**node C**`":
+ null] & E["`**node D:**`"
+ null]
+ C --> F["`**node E**`":
+ data] & G["`**node F**`":
+ data]
+ ```
+ </div>
+</div>
+
+In the example above, the children of node A (leaves) contain null values, which means node A does 
+too. Node B, meanwhile, reflects that its children also contain values.
+
+With this construction, we do not need to keep track of every individual node's hash. Instead, we 
+can assume hashes that reflect the default value are empty, and the subtree or node that lies 
+further down the chain of nodes can be disregarded; we only need to pay attention to the ones that 
+correspond to _non-empty_ subtrees.
+
+## Cryptographic accumulator
+
+In this context, we can consider Linea's sparse Merkle tree as a type of "cryptographic 
+accumulator". A cryptographic accumulator is a type of cryptographic primitive encoding a collection 
+of items into very short strings and allowing read/write operations to be proven. Merkle trees and 
+sparse Merkle trees are elementary examples of accumulators but there are others with more powerful 
+capabilities.
+
+Linea's state manager uses an extended version of a sparse Merkle tree that enables it to prove all 
+CRUD (create, read, update, delete) operations for a key-addressed database. As an outline, the 
+construction uses a sparse Merkle tree to store the nodes of a sorted doubly-linked list that 
+encodes all the non-zero items of the state.
+
+Linea's state manager uses the accumulator to track the account trie of Linea but also the storage 
+of every contract separately.
+
+The leaves of the tree have the following structure: `prev || next || hKey || hVal`.
+
+`hKey` and `hVal` are the hashes of the key and the value of the stored state entry, respectively. 
+`prev` and `next` are pointers storing the position of the leaves whose `hKeys` are immediately 
+lower and higher, respectively, following lexicographic order. The first two leaves of the SMT are 
+called the head and the tail, and are special in that they do not encode a stored tuple. The head is 
+the lowest possible `hKey`, while the tail is the highest possible `hKey`. They are therefore 
+situated at the beginning and the end of the linked list, respectively. Starting from the head, we 
+can access the SMT leaf stored at `head.next` to get the lowest "actually stored" item. Further 
+incrementing the `next` value will give us the second-lowest stored item and so on. Repeating the 
+process walks us through the entire set of stored items before we end up at the tail node, marking 
+the final step.
+
+Leaves can also be referred to as storage slots, in that they contain data about the contents of 
+the account in question.
+
+### Tracking empty leaves
+
+All leaves in the tree are populated with default/zero values at initialization. Since a 
+deterministic hashing function will ensure that these leaves are always represented by the same 
+hash, empty leaves can be easily recognized by the accumulator. 
+
+However, in order for the state manager to update a storage slot with data about a Linea account's 
+contents, it must know which empty leaf to overwrite, and exactly where these empty leaves are. A 
+further consideration is that we require the index of any 'new' leaf—an empty leaf being updated 
+so that it stores data—to be overwritten in a deterministic way. This requirement means that anyone 
+can theoretically reconstruct the tree simply by looking at transaction history. 
+
+To ensure consistency in the leaves' position, the state manager only ever inserts 'new' leaves to
+the left of the previous leaf in the tree. If this wasn't the case, and the state manager was able
+to insert any node in any position, it would be impossible to reconstitute the tree in the exact 
+same configuration, severely impacting the ability of L1 to verify the Merkle proof provided.
+
+## Applying the accumulator
+
+The Ethereum Virtual Machine (EVM) uses a variant of a Merkle tree known as a [Merkle-Patricia Trie](https://ethereum.org/en/developers/docs/data-structures-and-encoding/patricia-merkle-trie/) to track:
+- World state, which keeps track of accounts, and;
+- Account storage state (or simply 'storage'), which keeps track of the contents of each account. 
+
+On Linea, we adapt this structure. The Merkle-Patricia Trie is still used for world state, but the
+custom cryptographic accumulator described above is used for account storage state.
+
+The accumulator can perform the following operations:
+- **Insertion**: adding a new storage slot to the tree, triggered by storing a non-zero value in a 
+previously zero-valued slot.
+- **Update**: changing the value of an existing storage slot in the tree, triggered by storing a new 
+non-zero value in a previously non-zero slot.
+- **Deletion**: removing a storage slot from the tree, triggered by storing the zero value in a 
+previously non-zero slot.
+- **Read zero**: proving non-membership, triggered when a storage slot has been accessed, but not 
+updated, and its value is zero.
+- **Read non-zero**: proving membership, triggered when a storage slot has been accessed, but not 
+updated and its value is non-zero.
+
+These operations are applied to two trees; [world state](#world-state) and [account storage state](#account-storage-state).
+
+### World state
+
+The world state tree maps all accounts that exist on the blockchain—contracts and externally-owned 
+accounts (EOAs)—and points towards the account storage state for each. While on Ethereum Mainnet,
+this data is stored in a standard trie, Linea uses the the accumulator to map accounts as key:value 
+pairs. Otherwise, the implementation is similar to the EVM.
+
+Their structure is as follows:
+
+- `HKey`: Hash(`address`)
+- `Val`: Hash(`nonce`, `balance`, `storageRoot`, `codeHash`, `keccakCodeHash`, `CodeSize`)
+
+Critically, every piece of data fed into the `Val` (value) hash function must have a finite field
+interpretation. The data must be formatted this way to enable the Linea prover to correctly 
+access the world state when verifying proofs. Each element is formatted as follows (all elements
+require one field, other than `keccakCodeHash`):
+
+- `nonce`: The nonce is written in big-endian form into a `byte32`. For instance if the nonce is 10,
+then the nonce should be encoded as `0x000000000000000000000000000000000000000000000000000000000000000a`.
+- `balance`: Formatted the same as the nonce; big-endian `byte32`.
+- `storageRoot`: The storage root should _not_ be the Keccak of the Patricia trie root as in the 
+EVM, but the “custom Merkle tree” root of the account storage state that we describe in the 
+[following section](#account-storage-state).
+- `codeHash`: The code hash should not be the Keccak of the code, it should instead be the one 
+obtained as described in the following section.
+- 2 field elements for `keccakCodeHash`: One for the 128 most significant bits and one for the 128 
+least significant bits. The Keccak code hash corresponds exactly to the Keccak hash as specified by 
+the EVM (i.e. the output of [EXTCODEHASH](https://eips.ethereum.org/EIPS/eip-1052). We keep the 
+Keccak and the “custom” version for practical reasons.
+- `codeSize`: The code size should be the same value as that returned by the CODESIZE/EXTCODESIZE 
+opcodes.
+
+### Account storage state
+
+Also referred to as the storage trie, the account storage state is the database the state manager
+accesses to retrieve data about the contents of accounts. Account storage is mainly relevant for
+contract accounts; for EOAs, the data about assets and transactions is stored in the [world state](#world-state)
+`Val`, and the `codeHash` and its variants are empty. 
+
+Since the main function of account storage is to record contracts in such a way that they can be 
+easily retrieved and processed, it must efficiently encode the contract. It does this using the 
+following format:
+
+- `HKey`: Hash(`StorageKeyMSB`, `StorageKeyLSB`)
+- `Val`: Hash(`StorageValueMSB`, `StorageValueLSB`)
+
+In both cases, the `MSB` refers to the first 16 bytes of a 'word', and `LSB` the last 16. 'Word' in
+this context refers to the natural unit of data used by the EVM, which is 256-bit (32 byte) chunks. 
+
+For example, if the data regarding a contract's code was encoded in a `byte32`, the standard data
+type for words on the EVM and equivalents like Linea, it might look like this:
+```
+[a0, a1, a2, …., a15, b0, b1, …, b15]
+```
+
+That `byte32` would be split into an `MSB` and `LSB` like this:
+- `MSB`: `[0, 0, .., 0, a0, a1, a2, a3, .., a15]`
+- `LSB`: `[0, 0, .., 0, b0, b1, b2, b3, .., b15]`
+
+The `MSB` takes the first 16 bytes, and the `LSB` the second 16 bytes. 
+
+## Generating state-root-transition witnesses
+
+The accumulator, built using a sparse Merkle tree, is simultaneously: 
+
+- A data structure on which we can perform operations;
+- A dataset that we can summarize using a short string at any time (i.e. the root hash);
+- A tool that can be used by the Linea protocol to to verify that a given operation triggered a 
+transition from hash A to hash B.
+
+Once the accumulator has processed the trace information it receives about a new block and updated
+state accordingly, it can pass a new state root hash to the prover, via the coordinator. The state
+root hash can then be used by the prover as a "witness": a verifiable method of proving that the
+transactions in each block have taken place, without having to divulge the nature of those 
+transactions.