Implementation of the state rewind feature for the RocksDB #1996

xgreenx · 2024-06-27T23:45:34Z

Closes #451

Overview

Added support for the state rewind feature. The feature allows the execution of the blocks in the past and the same execution results to be received. Together with forkless upgrades, execution of any block from the past is possible if historical data exist for the target block height. The default size of historical/rewind window is 7 days.

Also added support for rollback command when state rewind feature is enabled. The command allows the rollback of the state of the blockchain several blocks behind until the end of the historical window.

Implementation details

The change adds a new HistoricalRocksDB type that is the wrapper around regular RocksDB. This type has inside another RocksDB instance that is used to duplicate all tables plus has one more column to store the reverse modifications at each block height. The reverse modification is the opposite to the operation that was done during transition from block height X to X + 1. The screenshot below should describe the idea:

The key of duplicated tables is extended with block height, and the value is the reverse operation to reach the state of entry at the previous height. Having the history of reverse operations, we can iterate back from the latest version of the entry to the previous one.

Using the main property of the RocksDB(sorting keys by default), lookup operations are fast and we don't need to iterate all modifications. It is just enough to find the nearest reverse operation to the target height.

Checklist

New behavior is reflected in tests

Before requesting review

I have reviewed the code myself
I have created follow-up issues caused by this PR and linked them here

…iteratable-view # Conflicts: # crates/fuel-core/src/database.rs # crates/fuel-core/src/service/adapters/producer.rs

# Conflicts: # crates/fuel-core/src/database.rs # crates/fuel-core/src/service/adapters/producer.rs

…-latest-view # Conflicts: # crates/fuel-core/src/database.rs

…torical-view-implementation # Conflicts: # crates/fuel-core/src/service/genesis/importer/import_task.rs # crates/fuel-core/src/state.rs # crates/fuel-core/src/state/in_memory/memory_store.rs # crates/fuel-core/src/state/rocks_db.rs

…-latest-view # Conflicts: # crates/fuel-core/src/database.rs

…torical-view-implementation

# Conflicts: # crates/fuel-core/src/database.rs

crates/fuel-core/src/database.rs

Dentosal · 2024-07-03T12:21:09Z

CHANGELOG.md

@@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
 ## [Unreleased]

 ### Added
+- [#1996](https://github.com/FuelLabs/fuel-core/pull/1996): Added support for rollback command when state rewind feature is enabled. The command allows the rollback of the state of the blockchain several blocks behind until the end of the historical window. The default historical window it 7 days.


Looks like this might also have breaking changes, at least some pub fields were made pub(crate).

If they were added since the last release, then it should be fine.

crates/fuel-core/src/state/historical_rocksdb.rs

MitchTurner · 2024-07-03T18:57:34Z

CHANGELOG.md

@@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
 ## [Unreleased]

 ### Added
+- [#1996](https://github.com/FuelLabs/fuel-core/pull/1996): Added support for rollback command when state rewind feature is enabled. The command allows the rollback of the state of the blockchain several blocks behind until the end of the historical window. The default historical window it 7 days.


If they were added since the last release, then it should be fine.

MitchTurner · 2024-07-03T19:28:26Z

bin/fuel-core/src/cli/rollback.rs

+ let shutdown_listener = ShutdownListener::spawn();
+ let target_block_height = command.target_block_height.into();
+
+ while !shutdown_listener.is_cancelled() {


nit: "While not is_cancelled" is kinda a confusing double negative. Maybe add a helper that is just like

fn is_still_live(&self) -> bool { !self.is_cancelled() }

MitchTurner · 2024-07-03T20:18:28Z

ci_checks.sh

@@ -13,8 +13,8 @@
 # - `cargo install cargo-insta`
 # - `npm install prettier prettier-plugin-toml`

-npx prettier --check "**/Cargo.toml" &&


Maybe we rename ci_checks.sh since this isn't "checking" anymore?

MitchTurner · 2024-07-03T20:31:42Z

crates/fuel-core/src/database/database_description.rs

@@ -34,10 +40,14 @@ impl DatabaseHeight for DaBlockHeight {
 fn advance_height(&self) -> Option<Self> {
 self.0.checked_add(1).map(Into::into)
 }
+
+ fn rollback_height(&self) -> Option<Self> {


Hmm. When would we want to use this for DaBlockHeight? For example, we default the relayer to not have rollback.

Voxelot · 2024-07-03T23:23:54Z

crates/fuel-core/src/state/historical_rocksdb.rs

+ state_rewind_policy: StateRewindPolicy,
+ ) -> DatabaseResult<Self> {
+ let path = db.path().join("history");
+ let history = RocksDb::default_open(path, None)?;


I see some risks to this proliferating pattern of creating many distinct rocksdb instances:

We lose ACID properties, i.e. issues like the offchain and onchain db getting out of sync can now also happen with the historical changes getting out of sync. What protections are in place to safeguard us from a situation where the changes are committed to the historical database but not to the primary database? I'm concerned that we will introduce new complicated edge cases that are hard to resolve if we don't properly encapsulate all the changes into atomic database transactions, leading to brittleness under adverse conditions.

Resource contention overhead such as too many open files and more complexity in configuring rocksdb cache sizes. For example, should all rocksdb instances have equal cache limits? It seems like to me, we should allocate more cache to the main databases vs the historical databases during regular block production or syncing etc. However, if an RPC node gets lots of heavy traffic for historical views compared to the amount of work it needs to do to sync, then more cache should be allocated for historical changes. It's hard to predict which db instances would need more caching as it will depend on the usecase. If we didn't have separate DB's then rocksdb would automatically make the most of the cache capacity allocated based on whatever usage scenario is thrown at it.

While I understand that the onchain and offchain updates are not atomic as they are driven by separate processes, I'm wondering if we could make the historical changes part of the same database that they are tracking the changes of. Given that the only way to recover from the offchain and onchain db's falling out of sync is to replay historical data, shouldn't we try to make the historical data as rock solid as possible given that it's part of the same CombinedDatabase::commit_changes function which should be atomic?

We lose ACID properties, i.e. issues like the offchain and onchain db getting out of sync can now also happen with the historical changes getting out of sync. What protections are in place to safeguard us from a situation where the changes are committed to the historical database but not to the primary database? I'm concerned that we will introduce new complicated edge cases that are hard to resolve if we don't properly encapsulate all the changes into atomic database transactions, leading to brittleness under adverse conditions.

This situation is handled by committing to the historical database first and only afterward to the wrapped database. We can always roll back the history database to the same height as the wrapped database.

Below an example of how we process this case while committing new block height:

Resource contention overhead such as too many open files and more complexity in configuring rocksdb cache sizes. For example, should all rocksdb instances have equal cache limits? It seems like to me, we should allocate more cache to the main databases vs the historical databases during regular block production or syncing etc. However, if an RPC node gets lots of heavy traffic for historical views compared to the amount of work it needs to do to sync, then more cache should be allocated for historical changes. It's hard to predict which db instances would need more caching as it will depend on the usecase. If we didn't have separate DB's then rocksdb would automatically make the most of the cache capacity allocated based on whatever usage scenario is thrown at it.

I agree with these concerns. We need to improve the configurability of these parameters. I think it should be possible to store historical data in the same database. I've split it into two databases to have better control over the database configuration and be able to prune historical data just by deleting the folder without additional tools.

I could try to use only one database.

Do you think merging into one db can be a follow-up? Or should we handle this before landing the feature since it'll be disruptive to the database schema with two back to back big db changes?

crates/fuel-core/src/state/historical_rocksdb.rs

Voxelot · 2024-07-04T00:12:52Z

crates/fuel-core/src/state/historical_rocksdb.rs

+ &self,
+ height: &Description::Height,
+ ) -> StorageResult<ViewAtHeight<Description>> {
+ let latest_view = self.latest_view()?;


Should we do some kind of consistency check here? There could be a race condition where the latest view is not in sync with the historical database when the snapshots are created.

This situation is handled on the GenericDatabase level. When we commit to the database, we lock the current block height.

The same is in the case of view_at. We lock the block height so the view can only be created outside of the commit.

crates/fuel-core/src/state/historical_rocksdb.rs

Voxelot · 2024-07-04T22:09:50Z

crates/fuel-core/src/state/in_memory/memory_store.rs

+ _: &Description::Height,
+ ) -> StorageResult<KeyValueView<Self::Column>> {
+ // TODO: https://github.com/FuelLabs/fuel-core/issues/1995
+ unimplemented!("The historical view is not implemented for `MemoryStore`")


Should we return an error here instead? We should be extra cautious about adding any panics

@Dentosal

## Version v0.31.0 ### Added - [#2014](#2014): Added a separate thread for the block importer. - [#2013](#2013): Added a separate thread to process P2P database lookups. - [#2004](#2004): Added new CLI argument `continue-services-on-error` to control internal flow of services. - [#2004](#2004): Added handling of incorrect shutdown of the off-chain GraphQL worker by using state rewind feature. - [#2007](#2007): Improved metrics: - Added database metrics per column. - Added statistic about commit time of each database. - Refactored how metrics are registered: Now, we use only one register shared between all metrics. This global register is used to encode all metrics. - [#1996](#1996): Added support for rollback command when state rewind feature is enabled. The command allows the rollback of the state of the blockchain several blocks behind until the end of the historical window. The default historical window it 7 days. - [#1996](#1996): Added support for the state rewind feature. The feature allows the execution of the blocks in the past and the same execution results to be received. Together with forkless upgrades, execution of any block from the past is possible if historical data exist for the target block height. - [#1994](#1994): Added the actual implementation for the `AtomicView::latest_view`. - [#1972](#1972): Implement `AlgorithmUpdater` for `GasPriceService` - [#1948](#1948): Add new `AlgorithmV1` and `AlgorithmUpdaterV1` for the gas price. Include tools for analysis - [#1676](#1676): Added new CLI arguments: - `graphql-max-depth` - `graphql-max-complexity` - `graphql-max-recursive-depth` ### Changed - [#2015](#2015): Small fixes for the database: - Fixed the name for historical columns - Metrics was working incorrectly for historical columns. - Added recommended setting for the RocksDB - The source of recommendation is official documentation https://github.com/facebook/rocksdb/wiki/Setup-Options-and-Basic-Tuning#other-general-options. - Removed repairing since it could corrupt the database if fails - Several users reported about the corrupted state of the database after having a "Too many descriptors" error where in logs, repairing of the database also failed with this error creating a `lost` folder. - [#2010](#2010): Updated the block importer to allow more blocks to be in the queue. It improves synchronization speed and mitigate the impact of other services on synchronization speed. - [#2006](#2006): Process block importer events first under P2P pressure. - [#2002](#2002): Adapted the block producer to react to checked transactions that were using another version of consensus parameters during validation in the TxPool. After an upgrade of the consensus parameters of the network, TxPool could store invalid `Checked` transactions. This change fixes that by tracking the version that was used to validate the transactions. - [#1999](#1999): Minimize the number of panics in the codebase. - [#1990](#1990): Use latest view for mutate GraphQL queries after modification of the node. - [#1992](#1992): Parse multiple relayer contracts, `RELAYER-V2-LISTENING-CONTRACTS` env variable using a `,` delimiter. - [#1980](#1980): Add `Transaction` to relayer 's event filter #### Breaking - [#2012](#2012): Bumped the `fuel-vm` to `0.55.0` release. More about the change [here](https://github.com/FuelLabs/fuel-vm/releases/tag/v0.55.0). - [#2001](#2001): Prevent GraphQL query body to be huge and cause OOM. The default body size is `1MB`. The limit can be changed by the `graphql-request-body-bytes-limit` CLI argument. - [#1991](#1991): Prepare the database to use different types than `Database` for atomic view. - [#1989](#1989): Extract `HistoricalView` trait from the `AtomicView`. - [#1676](#1676): New `fuel-core-client` is incompatible with the old `fuel-core` because of two requested new fields. - [#1676](#1676): Changed default value for `api-request-timeout` to be `30s`. - [#1676](#1676): Now, GraphQL API has complexity and depth limitations on the queries. The default complexity limit is `20000`. It is ~50 blocks per request with transaction IDs and ~2-5 full blocks. ### Fixed - [#2000](#2000): Use correct query name in metrics for aliased queries. ## What's Changed * Generate and publish code coverage reports in the CI by @Dentosal in #1947 * Gas Price Algorithm by @MitchTurner in #1948 * Use companies fork of the `publish-crates` action by @xgreenx in #1986 * Weekly `cargo update` by @github-actions in #1985 * Implement gas price updater for service by @MitchTurner in #1972 * Extract `HistoricalView` trait from the `AtomicView` by @xgreenx in #1989 * Use fresh `ReadView` for mutate queries by @xgreenx in #1990 * Prevent api spam with GQL complexity limits by @Voxelot in #1676 * Enable parsing multiple relayer listening contract addresses from environment variables by @Jurshsmith in #1992 * Prepare the database to use different types than `Database` for atomic view by @xgreenx in #1991 * Added the actual implementation for the `AtomicView::latest_view` by @xgreenx in #1994 * Weekly `cargo update` by @github-actions in #1998 * Minimize the number of panics in the codebase by @xgreenx in #1999 * feat: include Transaction events in topic0 filter for download_logs by @DefiCake in #1980 * Use correct query name for metrics by @xgreenx in #2000 * Prevent GraphQL query body to be huge and cause OOM by @xgreenx in #2001 * Adapted the block producer to react on the outdated transactions from the TxPool by @xgreenx in #2002 * Process block importer events first under P2P pressure by @xgreenx in #2006 * Implementation of the state rewind feature for the RocksDB by @xgreenx in #1996 * Upgraded `fuel-vm` to `0.55.0` by @xgreenx in #2012 * Improved metrics for the database by @xgreenx in #2007 * Updated block importer to allow more blocks to be queue by @xgreenx in #2010 * Added handling of incorrect shutdown of the off-chain GraphQL worker by @xgreenx in #2004 * Moved P2P database lookups into a separate thread by @xgreenx in #2013 * Use dedicated thread for the block importer by @xgreenx in #2014 * Small fixes for the database by @xgreenx in #2015 ## New Contributors * @Jurshsmith made their first contribution in #1992 * @DefiCake made their first contribution in #1980 **Full Changelog**: v0.30.0...v0.31.0

xgreenx and others added 28 commits June 26, 2024 09:21

Extract HistoricalView trait from the AtomicView

fea24a3

Updated changelog

4c20273

Removed not related code to the chagne

5119dbe

Fixed tests

4b3aecb

Fixed the column

b06c622

Move all functionality from the Database to the views

61f1059

Moved latest_height to HistoricalView trait

c5ec5bf

Merge branch 'refs/heads/feature/historical-view-trait' into feature/…

db90520

…iteratable-view # Conflicts: # crates/fuel-core/src/database.rs # crates/fuel-core/src/service/adapters/producer.rs

Improved readability

2b9eadc

Merge branch 'refs/heads/master' into feature/iteratable-view

c61477f

# Conflicts: # crates/fuel-core/src/database.rs # crates/fuel-core/src/service/adapters/producer.rs

Merge branch 'master' into feature/iteratable-view

c3d6f66

Updated CHANGELOG.md

29d543e

Simplify imports

682d256

Make CI happy

11d1d28

Added the actual implementation for the AtomicView::latest_view

187ec91

Updated CHANGELOG.md

ec86d61

Merge branch 'master' into feature/iteratable-view

22f6a63

Merge branch 'master' into feature/iteratable-view

46d0eca

Merge branch 'feature/iteratable-view' into feature/atomic-latest-view

19f7433

Simplify signature of the constracutor

f642a4a

Merge branch 'refs/heads/feature/iteratable-view' into feature/atomic…

c9f4ca2

…-latest-view # Conflicts: # crates/fuel-core/src/database.rs

The actual implementation of the historical view for RocksDB

7ad0907

Renamed IterableView into IterableKeyValueView

58c698e

Merged base branch

e6a02d6

Renamed IterableView to IterableKeyValueView

d5b6535

Merge branch 'refs/heads/feature/iteratable-view' into feature/atomic…

0bf402b

…-latest-view # Conflicts: # crates/fuel-core/src/database.rs

Merge branch 'refs/heads/feature/atomic-latest-view' into feature/his…

ffc6481

…torical-view-implementation

xgreenx self-assigned this Jun 27, 2024

Merge branch 'refs/heads/master' into feature/atomic-latest-view

97983c5

# Conflicts: # crates/fuel-core/src/database.rs

xgreenx added 2 commits July 2, 2024 11:19

Merge branch 'master' into feature/historical-view-implementation

0020b2f

Merge branch 'master' into feature/historical-view-implementation

af88cc9

xgreenx mentioned this pull request Jul 2, 2024

Added handling of incorrect shutdown of the off-chain GraphQL worker #2004

Merged

2 tasks

Merge branch 'master' into feature/historical-view-implementation

4d56031

Dentosal reviewed Jul 3, 2024

View reviewed changes

crates/fuel-core/src/database.rs Outdated Show resolved Hide resolved

Dentosal reviewed Jul 3, 2024

View reviewed changes

Merge branch 'master' into feature/historical-view-implementation

c466c13

MitchTurner reviewed Jul 3, 2024

View reviewed changes

Use better naming for variable=)

ee634e7

MitchTurner reviewed Jul 3, 2024

View reviewed changes

Voxelot reviewed Jul 3, 2024

View reviewed changes

Voxelot reviewed Jul 4, 2024

View reviewed changes

crates/fuel-core/src/state/historical_rocksdb.rs Show resolved Hide resolved

Voxelot reviewed Jul 4, 2024

View reviewed changes

crates/fuel-core/src/state/historical_rocksdb.rs Show resolved Hide resolved

xgreenx and others added 2 commits July 4, 2024 21:21

Merge branch 'master' into feature/historical-view-implementation

2633cec

Merge history and original databases into one

562171c

Voxelot reviewed Jul 4, 2024

View reviewed changes

Voxelot previously approved these changes Jul 4, 2024

View reviewed changes

Return error instead of panic

6885cb9

xgreenx dismissed Voxelot’s stale review via 6885cb9 July 4, 2024 22:14

xgreenx requested review from a team and Voxelot July 4, 2024 22:20

Voxelot approved these changes Jul 4, 2024

View reviewed changes

Dentosal approved these changes Jul 4, 2024

View reviewed changes

xgreenx enabled auto-merge (squash) July 4, 2024 22:30

xgreenx merged commit eb9c44b into master Jul 4, 2024
30 of 31 checks passed

xgreenx deleted the feature/historical-view-implementation branch July 4, 2024 22:45

xgreenx restored the feature/historical-view-implementation branch July 5, 2024 06:24

xgreenx mentioned this pull request Jul 5, 2024

Release v0.31.0 #2016

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of the state rewind feature for the RocksDB #1996

Implementation of the state rewind feature for the RocksDB #1996

xgreenx commented Jun 27, 2024 •

edited

Loading

Dentosal Jul 3, 2024

MitchTurner Jul 3, 2024

MitchTurner Jul 3, 2024

MitchTurner Jul 3, 2024

MitchTurner Jul 3, 2024

MitchTurner Jul 3, 2024

Voxelot Jul 3, 2024 •

edited

Loading

xgreenx Jul 4, 2024

Voxelot Jul 4, 2024

Voxelot Jul 4, 2024 •

edited

Loading

xgreenx Jul 4, 2024

Voxelot Jul 4, 2024 •

edited

Loading

Implementation of the state rewind feature for the RocksDB #1996

Implementation of the state rewind feature for the RocksDB #1996

Conversation

xgreenx commented Jun 27, 2024 • edited Loading

Overview

Implementation details

Checklist

Before requesting review

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Voxelot Jul 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Voxelot Jul 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Voxelot Jul 4, 2024 • edited Loading

Choose a reason for hiding this comment

xgreenx commented Jun 27, 2024 •

edited

Loading

Voxelot Jul 3, 2024 •

edited

Loading

Voxelot Jul 4, 2024 •

edited

Loading

Voxelot Jul 4, 2024 •

edited

Loading