Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent gossip of duplicate exits, slashings, BLS #5375

Closed

Conversation

michaelsproul
Copy link
Member

@michaelsproul michaelsproul commented Mar 8, 2024

Issue Addressed

Some of the other clients have pulled us up for propagating duplicate exits, slashings and BLS to execution changes on gossip. The reason for this is that Lighthouse only checks for duplicates against an in-memory cache which is cleared on restart.

Proposed Changes

This PR fixes the issue by checking for duplicate exits/slashings/address-change messages against the head state. E.g. if a validator is already exited in the state then we consider new exits for this validator as duplicates, and similarly for other messages.

There are 2 broad approaches I considered when fixing this issue, with different trade-offs.

Option 1: check against head state

This is the approach taken by this PR. With flat (not tree) states its performance is:

  • CPU: O(1) per validator, indexing into the BeaconState.
  • Memory: O(1). No additional storage beyond the state is required.

However with tree-states these random indexing operations will become more expensive, at which point we may want to reconsider:

  • CPU: O(log n) per validator for indexing into the BeaconState.
  • Memory: O(1).

Option 2: restore the cache on startup

There's a TODO in the code that's been there forever about reinitialising some of these caches on startup:

// TODO: allow for persisting and loading the pool from disk.
naive_aggregation_pool: <_>::default(),
// TODO: allow for persisting and loading the pool from disk.
naive_sync_aggregation_pool: <_>::default(),
// TODO: allow for persisting and loading the pool from disk.
observed_attestations: <_>::default(),
// TODO: allow for persisting and loading the pool from disk.
observed_sync_contributions: <_>::default(),
// TODO: allow for persisting and loading the pool from disk.
observed_gossip_attesters: <_>::default(),
// TODO: allow for persisting and loading the pool from disk.
observed_block_attesters: <_>::default(),
// TODO: allow for persisting and loading the pool from disk.
observed_sync_contributors: <_>::default(),
// TODO: allow for persisting and loading the pool from disk.
observed_aggregators: <_>::default(),
// TODO: allow for persisting and loading the pool from disk.
observed_sync_aggregators: <_>::default(),
// TODO: allow for persisting and loading the pool from disk.
observed_block_producers: <_>::default(),
observed_blob_sidecars: <_>::default(),
observed_slashable: <_>::default(),
observed_voluntary_exits: <_>::default(),
observed_proposer_slashings: <_>::default(),
observed_attester_slashings: <_>::default(),
observed_bls_to_execution_changes: <_>::default(),

Rather than storing them on disk independently, we could reconstruct them from the head state. This has the advantage of providing O(1) indexing with tree-states, at the cost of slightly more memory:

  • CPU: O(1) for indexing into the HashSet inside ObservedOperations.
  • Memory: O(n) where n is the number of slashed/exited validators.

With tree-states we could also take a hybrid approach, e.g. using the slashings_cache introduced in #5279 to check for slashings, and initialising the exit cache on startup so that it indexes in O(1).

The complication would be making this work with reset_at_fork_boundary, which would probably have to do a full re-initialisation at each fork boundary (potentially very slow).

Additional Info

Thanks to @nisdas from Prysm for tracking these invalid messages down to Lighthouse peers, and for suggesting a fix (Prysm had a similar issue until recently and fixed it by using the head state).

@michaelsproul michaelsproul added bug Something isn't working ready-for-review The code is ready for review v5.2.0 Q2 2024 Networking labels Mar 8, 2024
Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice and simple fix 👍 Looks good

@jimmygchen jimmygchen added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Mar 8, 2024
@michaelsproul
Copy link
Member Author

I'm now doubting whether this is necessary. AFAIK we should already be checking uniqueness in validate

Will mark this as WIP until we figure it out

@michaelsproul michaelsproul added work-in-progress PR is a work-in-progress and removed ready-for-merge This PR is ready to merge. labels Mar 8, 2024
@michaelsproul
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Networking v5.2.0 Q2 2024 work-in-progress PR is a work-in-progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants