Merge `unstable` into tree-states #4514

michaelsproul · 2023-07-18T07:57:25Z

THIS PR MUST NOT BE SQUASH MERGED

Proposed Changes

Back-merge the latest unstable to tree-states. This consolidates --db-migration-period (tree-states flag) with --epochs-per-migration (unstable flag). It also adds the phase0 rewards API which might be fun/useful on tree-states.

We now officially have ipv6 support. The mainnet bootnodes have been updated to support ipv6. This PR updates lighthouse's internal bootnodes for mainnet to avoid fetching them on initial load.

## Issue Addressed NA ## Proposed Changes Bump versions ## Additional Info NA

## Issue Addressed sigp#4331 ## Proposed Changes - Use comparison rather than strict equality between the earliest epoch we know about and the backfill target (which will be the most recent WSP by default or genesis) - Add helper function `BackFillSync<T>::would_complete` to achieve this in one location ## Additional Info - There's an ad hoc test for this in sigp#4461 Co-authored-by: Age Manning <Age@AgeManning.com>

## Issue Addressed [Users on Twitter](https://twitter.com/ashekhirin/status/1676334843192397824) are getting checkpoint sync URL timeouts with the default of 60s, so this PR increases the default timeout to 3 minutes. I've also added a short section to the book about adjusting the timeout with `--checkpoint-sync-url-timeout`.

*Replaces sigp#4434. It is identical, but this PR has a smaller diff due to a curated commit history.* ## Issue Addressed NA ## Proposed Changes This PR moves the scheduling logic for the `BeaconProcessor` into a new crate in `beacon_node/beacon_processor`. Previously it existed in the `beacon_node/network` crate. This addresses a circular-dependency problem where it's not possible to use the `BeaconProcessor` from the `beacon_chain` crate. The `network` crate depends on the `beacon_chain` crate (`network -> beacon_chain`), but importing the `BeaconProcessor` into the `beacon_chain` crate would create a circular dependancy of `beacon_chain -> network`. The `BeaconProcessor` was designed to provide queuing and prioritized scheduling for messages from the network. It has proven to be quite valuable and I believe we'd make Lighthouse more stable and effective by using it elsewhere. In particular, I think we should use the `BeaconProcessor` for: 1. HTTP API requests. 1. Scheduled tasks in the `BeaconChain` (e.g., state advance). Using the `BeaconProcessor` for these tasks would help prevent the BN from becoming overwhelmed and would also help it to prioritize operations (e.g., choosing to process blocks from gossip before responding to low-priority HTTP API requests). ## Additional Info This PR is intended to have zero impact on runtime behaviour. It aims to simply separate the *scheduling* code (i.e., the `BeaconProcessor`) from the *business logic* in the `network` crate (i.e., the `Worker` impls). Future PRs (see sigp#4462) can build upon these works to actually use the `BeaconProcessor` for more operations. I've gone to some effort to use `git mv` to make the diff look more like "file was moved and modified" rather than "file was deleted and a new one added". This should reduce review burden and help maintain commit attribution.

## Proposed Changes * Add `lcli state-root` command for computing the hash tree root of a `BeaconState`. * Add a `--network` flag which can be used instead of `--testnet-dir` to set the network, e.g. Mainnet, Goerli, Gnosis. * Use the new network flag in `transition-blocks`, `skip-slots`, and `block-root`, which previously only supported mainnet. * **BREAKING CHANGE** Remove the default value of `~/.lighthouse/testnet` from `--testnet-dir`. This may have made sense in previous versions where `lcli` was more testnet focussed, but IMO it is an unnecessary complication and foot-gun today.

## Issue Addressed sigp#4494 ## Proposed Changes - Remove explicit re-exports of various types to appease the new compiler lint ## Additional Info It seems `warn(hidden_glob_reexports)` is the main culprit.

## Issue Addressed Addresses issue: sigp#4459 ## Proposed Changes `cargo update -p proc-macro2` proc-macro2 v1.0.58 -> v1.0.63 ## Additional Info See: sigp#4459 / rust-lang/rust#113152

## Issue Addressed Fix an issue observed by `@zlan` on Discord where Lighthouse would sometimes return this error when looking up states via the API: > {"code":500,"message":"UNHANDLED_ERROR: ForkChoiceError(MissingProtoArrayBlock(0xc9cf1495421b6ef3215d82253b388d77321176a1dcef0db0e71a0cd0ffc8cdb7))","stacktraces":[]} ## Proposed Changes The error stems from a faulty assumption in the HTTP API logic: that any state in the hot database must have its block in fork choice. This isn't true because the state's hot database may update much less frequently than the fork choice store, e.g. if reconstructing states (where freezer migration pauses), or if the freezer migration runs slowly. There could also be a race between loading the hot state and checking fork choice, e.g. even if the finalization migration of DB+fork choice were atomic, the update could happen between the 1st and 2nd calls. To address this I've changed the HTTP API logic to use the finalized block's execution status as a fallback where it is safe to do so. In the case where a block is non-canonical and prior to finalization (permanently orphaned) we default `execution_optimistic` to `true`. ## Additional Info I've also added a new CLI flag to reduce the frequency of the finalization migration as this is useful for several purposes: - Spacing out database writes (less frequent, larger batches) - Keeping a limited chain history with high availability, e.g. the last month in the hot database. This new flag made it _substantially_ easier to test this change. It was extracted from `tree-states` (where it's called `--db-migration-period`), which is why this PR also carries the `tree-states` label.

## Issue Addressed When trying to run `eth1-sim` locally, the simulator doesn't start for me, and panicked due to duplicate arg names for `proposer-nodes` (using same arg names as `nodes`). Not sure why this isn't failing on CI but failing on mine 🤔 ``` thread 'main' panicked at 'Argument short must be unique thread 'main' panicked at 'Argument long must be unique ```

## Issue Addressed sigp#4488 ## Proposed Changes - Remove all instances of the `required` modifier where we have a default value specified for a subcommand ## Additional Info N/A

## Issue Addressed Fixes occasional compilation errors with mev-rs (see sigp#4456). ## Proposed Changes - Update `mev-rs` to the latest version, which allows us to remove hacky `[patch]` sections - Update the `axum` version used in `watch` so LH only uses a single version

## Proposed Changes Replace `wget` in the EF-tests makefile with `curl`. On macOS `curl` is pre-installed, and I found myself making this change to avoid installing `wget`. The `-L` flag is used to follow redirects which is useful if a repo gets renamed, and more similar to `wget`'s default behaviour.

## Issue Addressed Addresses an issue where CI could fail due to an nonexisting file error: ``` Run ./clean.sh rm: cannot remove '/home/runner/.lighthouse/local-testnet/geth_datadir4/geth/fastcache.tmp.1549331618': No such file or directory Error: Process completed with exit code 1. ``` This seems to happen quite frequently now, I'm not sure exactly why but perhaps worth trying suppressing the prompt? https://github.com/sigp/lighthouse/actions/runs/5455027574/jobs/9925916159?pr=4463

## Issue Addressed This PR attempts to workaround the recent frequent eth1 simulator failures caused by missing eth logs from Anvil. > FailedToInsertDeposit(NonConsecutive { log_index: 1, expected: 0 }) This usually occurs at the beginning of the tests, and it guarantees a timeout after a few hours if this log shows up, and this is currently causing our CIs to fail quite frequently. Example failure here: https://github.com/sigp/lighthouse/actions/runs/5525760195/jobs/10079736914 ## Proposed Changes The quick fix applied here adds a timeout to node startup and restarts the node again. - Add a 60 seconds timeout to beacon node startup in eth1 simulator tests. It takes ~10 seconds on my machine, but could take longer on CI runners. - Wrap the startup code in a retry function, that allows for 3 retries before returning an error. ## Additional Info We should probably raise an issue under the Anvil GitHub repo there so this can be further investigated.

## Issue Addressed N/A ## Proposed Changes Add lints for rust 1.71 [3789134](sigp@3789134) is probably the one that needs most attention as it changes beacon state code. I changed the `is_in_inactivity_leak ` function to return a `ArithError` as not all consumers of that function work well with a `BeaconState::Error`.

## Issue Addressed Speed up CI by installing foundry with Github action instead of building Anvil from source. Building anvil from source on GItHub hosted runners currently takes about 10 mins. Using the `foundry-toolchain` action to install only takes about 2 seconds.

…sts (sigp#4485) ## Issue Addressed n/a Noticed this while working on something else ## Proposed Changes - leverage the appropriate types to avoid a bunch of `unwrap` and errors ## Additional Info n/a

## Issue Addressed Addresses sigp#4026. Beacon-API spec [here](https://ethereum.github.io/beacon-APIs/?urls.primaryName=dev#/Beacon/getAttestationsRewards). Endpoint: `POST /eth/v1/beacon/rewards/attestations/{epoch}` This endpoint already supports post-Altair epochs. This PR adds support for phase 0 rewards calculation. ## Proposed Changes - [x] Attestation rewards API to support phase 0 rewards calculation, re-using logic from `state_processing`. Refactored `get_attestation_deltas` slightly to support computing deltas for a subset of validators. - [x] Add `inclusion_delay` to `ideal_rewards` (`beacon-API` spec update to follow) - [x] Add `inactivity` penalties to both `ideal_rewards` and `total_rewards` (`beacon-API` spec update to follow) - [x] Add tests to compute attestation rewards and compare results with beacon states ## Additional Notes - The extra penalty for missing attestations or being slashed during an inactivity leak is currently not included in the API response (for both phase 0 and Altair) in the spec. - I went with adding `inactivity` as a separate component rather than combining them with the 4 rewards, because this is how it was grouped in [the phase 0 spec](https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/beacon-chain.md#get_attestation_deltas). During inactivity leak, all rewards include the optimal reward, and inactivity penalties are calculated separately (see below code snippet from the spec), so it would be quite confusing if we merge them. This would also work better with Altair, because there's no "cancelling" of rewards and inactivity penalties are more separate. - Altair calculation logic (to include inactivity penalties) to be updated in a follow-up PR. ```python def get_attestation_deltas(state: BeaconState) -> Tuple[Sequence[Gwei], Sequence[Gwei]]: """ Return attestation reward/penalty deltas for each validator. """ source_rewards, source_penalties = get_source_deltas(state) target_rewards, target_penalties = get_target_deltas(state) head_rewards, head_penalties = get_head_deltas(state) inclusion_delay_rewards, _ = get_inclusion_delay_deltas(state) _, inactivity_penalties = get_inactivity_penalty_deltas(state) rewards = [ source_rewards[i] + target_rewards[i] + head_rewards[i] + inclusion_delay_rewards[i] for i in range(len(state.validators)) ] penalties = [ source_penalties[i] + target_penalties[i] + head_penalties[i] + inactivity_penalties[i] for i in range(len(state.validators)) ] return rewards, penalties ``` ## Example API Response <details> <summary>Click me</summary> ```json { "ideal_rewards": [ { "effective_balance": "1000000000", "head": "6638", "target": "6638", "source": "6638", "inclusion_delay": "9783", "inactivity": "0" }, { "effective_balance": "2000000000", "head": "13276", "target": "13276", "source": "13276", "inclusion_delay": "19565", "inactivity": "0" }, { "effective_balance": "3000000000", "head": "19914", "target": "19914", "source": "19914", "inclusion_delay": "29349", "inactivity": "0" }, { "effective_balance": "4000000000", "head": "26553", "target": "26553", "source": "26553", "inclusion_delay": "39131", "inactivity": "0" }, { "effective_balance": "5000000000", "head": "33191", "target": "33191", "source": "33191", "inclusion_delay": "48914", "inactivity": "0" }, { "effective_balance": "6000000000", "head": "39829", "target": "39829", "source": "39829", "inclusion_delay": "58697", "inactivity": "0" }, { "effective_balance": "7000000000", "head": "46468", "target": "46468", "source": "46468", "inclusion_delay": "68480", "inactivity": "0" }, { "effective_balance": "8000000000", "head": "53106", "target": "53106", "source": "53106", "inclusion_delay": "78262", "inactivity": "0" }, { "effective_balance": "9000000000", "head": "59744", "target": "59744", "source": "59744", "inclusion_delay": "88046", "inactivity": "0" }, { "effective_balance": "10000000000", "head": "66383", "target": "66383", "source": "66383", "inclusion_delay": "97828", "inactivity": "0" }, { "effective_balance": "11000000000", "head": "73021", "target": "73021", "source": "73021", "inclusion_delay": "107611", "inactivity": "0" }, { "effective_balance": "12000000000", "head": "79659", "target": "79659", "source": "79659", "inclusion_delay": "117394", "inactivity": "0" }, { "effective_balance": "13000000000", "head": "86298", "target": "86298", "source": "86298", "inclusion_delay": "127176", "inactivity": "0" }, { "effective_balance": "14000000000", "head": "92936", "target": "92936", "source": "92936", "inclusion_delay": "136959", "inactivity": "0" }, { "effective_balance": "15000000000", "head": "99574", "target": "99574", "source": "99574", "inclusion_delay": "146742", "inactivity": "0" }, { "effective_balance": "16000000000", "head": "106212", "target": "106212", "source": "106212", "inclusion_delay": "156525", "inactivity": "0" }, { "effective_balance": "17000000000", "head": "112851", "target": "112851", "source": "112851", "inclusion_delay": "166307", "inactivity": "0" }, { "effective_balance": "18000000000", "head": "119489", "target": "119489", "source": "119489", "inclusion_delay": "176091", "inactivity": "0" }, { "effective_balance": "19000000000", "head": "126127", "target": "126127", "source": "126127", "inclusion_delay": "185873", "inactivity": "0" }, { "effective_balance": "20000000000", "head": "132766", "target": "132766", "source": "132766", "inclusion_delay": "195656", "inactivity": "0" }, { "effective_balance": "21000000000", "head": "139404", "target": "139404", "source": "139404", "inclusion_delay": "205439", "inactivity": "0" }, { "effective_balance": "22000000000", "head": "146042", "target": "146042", "source": "146042", "inclusion_delay": "215222", "inactivity": "0" }, { "effective_balance": "23000000000", "head": "152681", "target": "152681", "source": "152681", "inclusion_delay": "225004", "inactivity": "0" }, { "effective_balance": "24000000000", "head": "159319", "target": "159319", "source": "159319", "inclusion_delay": "234787", "inactivity": "0" }, { "effective_balance": "25000000000", "head": "165957", "target": "165957", "source": "165957", "inclusion_delay": "244570", "inactivity": "0" }, { "effective_balance": "26000000000", "head": "172596", "target": "172596", "source": "172596", "inclusion_delay": "254352", "inactivity": "0" }, { "effective_balance": "27000000000", "head": "179234", "target": "179234", "source": "179234", "inclusion_delay": "264136", "inactivity": "0" }, { "effective_balance": "28000000000", "head": "185872", "target": "185872", "source": "185872", "inclusion_delay": "273918", "inactivity": "0" }, { "effective_balance": "29000000000", "head": "192510", "target": "192510", "source": "192510", "inclusion_delay": "283701", "inactivity": "0" }, { "effective_balance": "30000000000", "head": "199149", "target": "199149", "source": "199149", "inclusion_delay": "293484", "inactivity": "0" }, { "effective_balance": "31000000000", "head": "205787", "target": "205787", "source": "205787", "inclusion_delay": "303267", "inactivity": "0" }, { "effective_balance": "32000000000", "head": "212426", "target": "212426", "source": "212426", "inclusion_delay": "313050", "inactivity": "0" } ], "total_rewards": [ { "validator_index": "0", "head": "212426", "target": "212426", "source": "212426", "inclusion_delay": "313050", "inactivity": "0" }, { "validator_index": "32", "head": "212426", "target": "212426", "source": "212426", "inclusion_delay": "313050", "inactivity": "0" }, { "validator_index": "63", "head": "-357771", "target": "-357771", "source": "-357771", "inclusion_delay": "0", "inactivity": "0" } ] } ``` </details>

AgeManning and others added 19 commits July 3, 2023 03:20

Ipv6 bootnodes update (sigp#4394)

8e65419

We now officially have ipv6 support. The mainnet bootnodes have been updated to support ipv6. This PR updates lighthouse's internal bootnodes for mainnet to avoid fetching them on initial load.

Release v4.3.0 (sigp#4452)

dfcb336

## Issue Addressed NA ## Proposed Changes Bump versions ## Additional Info NA

Remove hidden re-exports to appease Rust 1.73 (sigp#4495)

62c9170

## Issue Addressed sigp#4494 ## Proposed Changes - Remove explicit re-exports of various types to appease the new compiler lint ## Additional Info It seems `warn(hidden_glob_reexports)` is the main culprit.

bump proc-macro2 (sigp#4464)

0c7eed5

## Issue Addressed Addresses issue: sigp#4459 ## Proposed Changes `cargo update -p proc-macro2` proc-macro2 v1.0.58 -> v1.0.63 ## Additional Info See: sigp#4459 / rust-lang/rust#113152

Remove instances of required arguments with default values (sigp#4489)

1a5de8b

## Issue Addressed sigp#4488 ## Proposed Changes - Remove all instances of the `required` modifier where we have a default value specified for a subcommand ## Additional Info N/A

Cleanup unreachable code in lcli::generate_bootnode_enr and some te…

4435a22

…sts (sigp#4485) ## Issue Addressed n/a Noticed this while working on something else ## Proposed Changes - leverage the appropriate types to avoid a bunch of `unwrap` and errors ## Additional Info n/a

michaelsproul added work-in-progress PR is a work-in-progress tree-states Upcoming state and database overhaul labels Jul 18, 2023

Merge remote-tracking branch 'origin/unstable' into tree-states

8423e9f

michaelsproul force-pushed the tree-states-merge-unstable branch from dda2392 to 8423e9f Compare July 19, 2023 01:24

michaelsproul added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Jul 19, 2023

jimmygchen approved these changes Jul 19, 2023

View reviewed changes

michaelsproul merged commit 8423e9f into sigp:tree-states Jul 19, 2023
29 checks passed

michaelsproul deleted the tree-states-merge-unstable branch July 19, 2023 04:13

jimmygchen mentioned this pull request Jul 20, 2023

Beacon chain lags behind frequently when running the latest tree-states branch #4521

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge `unstable` into tree-states #4514

Merge `unstable` into tree-states #4514

michaelsproul commented Jul 18, 2023

Merge unstable into tree-states #4514

Merge unstable into tree-states #4514

Conversation

michaelsproul commented Jul 18, 2023

Proposed Changes

Merge `unstable` into tree-states #4514

Merge `unstable` into tree-states #4514