-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge unstable deneb june 6th #4477
Merge unstable deneb june 6th #4477
Conversation
## Issue Addressed sigp#4314 ## Proposed Changes avoid relocking head during builder health check ## Additional Info NA
## Issue Addressed - sigp#4293 - sigp#4264 ## Proposed Changes *Changes largely follow those suggested in the main issue*. - Add new routes to HTTP API - `post_beacon_blocks_v2` - `post_blinded_beacon_blocks_v2` - Add new routes to `BeaconNodeHttpClient` - `post_beacon_blocks_v2` - `post_blinded_beacon_blocks_v2` - Define new Eth2 common types - `BroadcastValidation`, enum representing the level of validation to apply to blocks prior to broadcast - `BroadcastValidationQuery`, the corresponding HTTP query string type for the above type - ~~Define `_checked` variants of both `publish_block` and `publish_blinded_block` that enforce a validation level at a type level~~ - Add interactive tests to the `bn_http_api_tests` test target covering each validation level (to their own test module, `broadcast_validation_tests`) - `beacon/blocks` - `broadcast_validation=gossip` - Invalid (400) - Full Pass (200) - Partial Pass (202) - `broadcast_validation=consensus` - Invalid (400) - Only gossip (400) - Only consensus pass (i.e., equivocates) (200) - Full pass (200) - `broadcast_validation=consensus_and_equivocation` - Invalid (400) - Invalid due to early equivocation (400) - Only gossip (400) - Only consensus (400) - Pass (200) - `beacon/blinded_blocks` - `broadcast_validation=gossip` - Invalid (400) - Full Pass (200) - Partial Pass (202) - `broadcast_validation=consensus` - Invalid (400) - Only gossip (400) - ~~Only consensus pass (i.e., equivocates) (200)~~ - Full pass (200) - `broadcast_validation=consensus_and_equivocation` - Invalid (400) - Invalid due to early equivocation (400) - Only gossip (400) - Only consensus (400) - Pass (200) - Add a new trait, `IntoGossipVerifiedBlock`, which allows type-level guarantees to be made as to gossip validity - Modify the structure of the `ObservedBlockProducers` cache from a `(slot, validator_index)` mapping to a `((slot, validator_index), block_root)` mapping - Modify `ObservedBlockProducers::proposer_has_been_observed` to return a `SeenBlock` rather than a boolean on success - Punish gossip peer (low) for submitting equivocating blocks - Rename `BlockError::SlashablePublish` to `BlockError::SlashableProposal` ## Additional Info This PR contains changes that directly modify how blocks are verified within the client. For more context, consult [comments in-thread](sigp#4316 (comment)). Co-authored-by: Michael Sproul <michael@sigmaprime.io>
## Issue Addressed [sigp#4259](sigp#4259) ## Proposed Changes debounce spammy `Unable to send message to the beacon processor` log messages ## Additional Info We could potentially debounce other logs that have the potential to be "spammy". After some feedback we decided to additionally add the following change: create a newtype wrapper around `mpsc::Sender<BeaconWorkEvent<T>>`. When there is an error on the try_send method on the wrapper, we increase a counter metric with one label per work type.
…tion (sigp#4362) ## Issue Addressed sigp#4118 ## Proposed Changes This PR introduces a "progressive balances" cache on the `BeaconState`, which keeps track of the accumulated target attestation balance for the current & previous epochs. The cached values are utilised by fork choice to calculate unrealized justification and finalization (instead of converting epoch participation arrays to balances for each block we receive). This optimization will be rolled out gradually to allow for more testing. A new `--progressive-balances disabled|checked|strict|fast` flag is introduced to support this: - `checked`: enabled with checks against participation cache, and falls back to the existing epoch processing calculation if there is a total target attester balance mismatch. There is no performance gain from this as the participation cache still needs to be computed. **This is the default mode for now.** - `strict`: enabled with checks against participation cache, returns error if there is a mismatch. **Used for testing only**. - `fast`: enabled with no comparative checks and without computing the participation cache. This mode gives us the performance gains from the optimization. This is still experimental and not currently recommended for production usage, but will become the default mode in a future release. - `disabled`: disable the usage of progressive cache, and use the existing method for FFG progression calculation. This mode may be useful if we find a bug and want to stop the frequent error logs. ### Tasks - [x] Initial cache implementation in `BeaconState` - [x] Perform checks in fork choice to compare the progressive balances cache against results from `ParticipationCache` - [x] Add CLI flag, and disable the optimization by default - [x] Testing on Goerli & Benchmarking - [x] Move caching logic from state processing to the `ProgressiveBalancesCache` (see [this comment](sigp#4362 (comment))) - [x] Add attesting balance metrics Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>
We now officially have ipv6 support. The mainnet bootnodes have been updated to support ipv6. This PR updates lighthouse's internal bootnodes for mainnet to avoid fetching them on initial load.
## Issue Addressed NA ## Proposed Changes Bump versions ## Additional Info NA
## Issue Addressed sigp#4331 ## Proposed Changes - Use comparison rather than strict equality between the earliest epoch we know about and the backfill target (which will be the most recent WSP by default or genesis) - Add helper function `BackFillSync<T>::would_complete` to achieve this in one location ## Additional Info - There's an ad hoc test for this in sigp#4461 Co-authored-by: Age Manning <Age@AgeManning.com>
…m/realbigsean/lighthouse into merge-unstable-deneb-june-6th
## Issue Addressed [Users on Twitter](https://twitter.com/ashekhirin/status/1676334843192397824) are getting checkpoint sync URL timeouts with the default of 60s, so this PR increases the default timeout to 3 minutes. I've also added a short section to the book about adjusting the timeout with `--checkpoint-sync-url-timeout`.
*Replaces sigp#4434. It is identical, but this PR has a smaller diff due to a curated commit history.* ## Issue Addressed NA ## Proposed Changes This PR moves the scheduling logic for the `BeaconProcessor` into a new crate in `beacon_node/beacon_processor`. Previously it existed in the `beacon_node/network` crate. This addresses a circular-dependency problem where it's not possible to use the `BeaconProcessor` from the `beacon_chain` crate. The `network` crate depends on the `beacon_chain` crate (`network -> beacon_chain`), but importing the `BeaconProcessor` into the `beacon_chain` crate would create a circular dependancy of `beacon_chain -> network`. The `BeaconProcessor` was designed to provide queuing and prioritized scheduling for messages from the network. It has proven to be quite valuable and I believe we'd make Lighthouse more stable and effective by using it elsewhere. In particular, I think we should use the `BeaconProcessor` for: 1. HTTP API requests. 1. Scheduled tasks in the `BeaconChain` (e.g., state advance). Using the `BeaconProcessor` for these tasks would help prevent the BN from becoming overwhelmed and would also help it to prioritize operations (e.g., choosing to process blocks from gossip before responding to low-priority HTTP API requests). ## Additional Info This PR is intended to have zero impact on runtime behaviour. It aims to simply separate the *scheduling* code (i.e., the `BeaconProcessor`) from the *business logic* in the `network` crate (i.e., the `Worker` impls). Future PRs (see sigp#4462) can build upon these works to actually use the `BeaconProcessor` for more operations. I've gone to some effort to use `git mv` to make the diff look more like "file was moved and modified" rather than "file was deleted and a new one added". This should reduce review burden and help maintain commit attribution.
This reverts commit 246d52d.
## Proposed Changes * Add `lcli state-root` command for computing the hash tree root of a `BeaconState`. * Add a `--network` flag which can be used instead of `--testnet-dir` to set the network, e.g. Mainnet, Goerli, Gnosis. * Use the new network flag in `transition-blocks`, `skip-slots`, and `block-root`, which previously only supported mainnet. * **BREAKING CHANGE** Remove the default value of `~/.lighthouse/testnet` from `--testnet-dir`. This may have made sense in previous versions where `lcli` was more testnet focussed, but IMO it is an unnecessary complication and foot-gun today.
## Issue Addressed sigp#4494 ## Proposed Changes - Remove explicit re-exports of various types to appease the new compiler lint ## Additional Info It seems `warn(hidden_glob_reexports)` is the main culprit.
…rge-unstable-deneb-june-6th
## Issue Addressed Addresses issue: sigp#4459 ## Proposed Changes `cargo update -p proc-macro2` proc-macro2 v1.0.58 -> v1.0.63 ## Additional Info See: sigp#4459 / rust-lang/rust#113152
## Issue Addressed Fix an issue observed by `@zlan` on Discord where Lighthouse would sometimes return this error when looking up states via the API: > {"code":500,"message":"UNHANDLED_ERROR: ForkChoiceError(MissingProtoArrayBlock(0xc9cf1495421b6ef3215d82253b388d77321176a1dcef0db0e71a0cd0ffc8cdb7))","stacktraces":[]} ## Proposed Changes The error stems from a faulty assumption in the HTTP API logic: that any state in the hot database must have its block in fork choice. This isn't true because the state's hot database may update much less frequently than the fork choice store, e.g. if reconstructing states (where freezer migration pauses), or if the freezer migration runs slowly. There could also be a race between loading the hot state and checking fork choice, e.g. even if the finalization migration of DB+fork choice were atomic, the update could happen between the 1st and 2nd calls. To address this I've changed the HTTP API logic to use the finalized block's execution status as a fallback where it is safe to do so. In the case where a block is non-canonical and prior to finalization (permanently orphaned) we default `execution_optimistic` to `true`. ## Additional Info I've also added a new CLI flag to reduce the frequency of the finalization migration as this is useful for several purposes: - Spacing out database writes (less frequent, larger batches) - Keeping a limited chain history with high availability, e.g. the last month in the hot database. This new flag made it _substantially_ easier to test this change. It was extracted from `tree-states` (where it's called `--db-migration-period`), which is why this PR also carries the `tree-states` label.
## Issue Addressed When trying to run `eth1-sim` locally, the simulator doesn't start for me, and panicked due to duplicate arg names for `proposer-nodes` (using same arg names as `nodes`). Not sure why this isn't failing on CI but failing on mine 🤔 ``` thread 'main' panicked at 'Argument short must be unique thread 'main' panicked at 'Argument long must be unique ```
## Issue Addressed sigp#4488 ## Proposed Changes - Remove all instances of the `required` modifier where we have a default value specified for a subcommand ## Additional Info N/A
## Issue Addressed Fixes occasional compilation errors with mev-rs (see sigp#4456). ## Proposed Changes - Update `mev-rs` to the latest version, which allows us to remove hacky `[patch]` sections - Update the `axum` version used in `watch` so LH only uses a single version
## Proposed Changes Replace `wget` in the EF-tests makefile with `curl`. On macOS `curl` is pre-installed, and I found myself making this change to avoid installing `wget`. The `-L` flag is used to follow redirects which is useful if a repo gets renamed, and more similar to `wget`'s default behaviour.
## Issue Addressed Addresses an issue where CI could fail due to an nonexisting file error: ``` Run ./clean.sh rm: cannot remove '/home/runner/.lighthouse/local-testnet/geth_datadir4/geth/fastcache.tmp.1549331618': No such file or directory Error: Process completed with exit code 1. ``` This seems to happen quite frequently now, I'm not sure exactly why but perhaps worth trying suppressing the prompt? https://github.com/sigp/lighthouse/actions/runs/5455027574/jobs/9925916159?pr=4463
## Issue Addressed This PR attempts to workaround the recent frequent eth1 simulator failures caused by missing eth logs from Anvil. > FailedToInsertDeposit(NonConsecutive { log_index: 1, expected: 0 }) This usually occurs at the beginning of the tests, and it guarantees a timeout after a few hours if this log shows up, and this is currently causing our CIs to fail quite frequently. Example failure here: https://github.com/sigp/lighthouse/actions/runs/5525760195/jobs/10079736914 ## Proposed Changes The quick fix applied here adds a timeout to node startup and restarts the node again. - Add a 60 seconds timeout to beacon node startup in eth1 simulator tests. It takes ~10 seconds on my machine, but could take longer on CI runners. - Wrap the startup code in a retry function, that allows for 3 retries before returning an error. ## Additional Info We should probably raise an issue under the Anvil GitHub repo there so this can be further investigated.
## Issue Addressed N/A ## Proposed Changes Add lints for rust 1.71 [3789134](sigp@3789134) is probably the one that needs most attention as it changes beacon state code. I changed the `is_in_inactivity_leak ` function to return a `ArithError` as not all consumers of that function work well with a `BeaconState::Error`.
## Issue Addressed Speed up CI by installing foundry with Github action instead of building Anvil from source. Building anvil from source on GItHub hosted runners currently takes about 10 mins. Using the `foundry-toolchain` action to install only takes about 2 seconds.
…sts (sigp#4485) ## Issue Addressed n/a Noticed this while working on something else ## Proposed Changes - leverage the appropriate types to avoid a bunch of `unwrap` and errors ## Additional Info n/a
@realbigsean The branch builds and passes lint, but I'm not sure about the correctness of the merge since it appeared to be a tricky one. I've added comments on places where I added additional code due to the |
…rge-unstable-deneb-jul-14
…albigsean/lighthouse into merge-unstable-deneb-jul-14
Merge unstable deneb jul 14
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I mainly looked at changes in deneb stuff resulting from the merge and paid some extra attention to changes resulting from:
- Broadcast validation PR
- Changes in beacon processor because of moving it to a separate module
I'm satisfied with the changes. Amazing work with the changes, the conflicts would have been nasty.
Feel free to merge (i think there's one conflict remaining still)
@@ -2798,6 +2831,9 @@ impl<T: BeaconChainTypes> BeaconChain<T> { | |||
notify_execution_layer, | |||
)?; | |||
|
|||
//TODO(sean) error handling? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably something that makes more sense in the original PR. I say we keep it as is here and change it only if required for something deneb specific.
Also, the errors are being handled at the calling site as well, so not too much of an issue imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I think the thing here was I was wondering if I should pass the error through handle_block_error
, but that method just prints out a log. I think publish_fn
handles the logging fine, there's only a couple error cases and they're logged or returned via API errors
@realbigsean merged |
Issue Addressed
Which issue # does this PR address?
Proposed Changes
Please list or describe the changes introduced by this PR.
Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.