Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct gossipsub mesh and connected peer inconsistencies #6244

Merged
merged 8 commits into from
Oct 30, 2024

Conversation

AgeManning
Copy link
Member

Issue Addressed

We are seeing errors in gossipsub related to an inconsistent state between mesh peers and connected peers.

There appears to be a few problems. This PR will be updated as we find all the bugs.

The first I have noticed is that the IDONTWANT messages also look for in-flight gossipsub promises. When a peer disconnects we dont remove the tracking of in-flight gossipsub promises, because it is inefficient search (it's not index'd by peer-id). Therefore we can have disconnected peers in this mapping and if we go to send an IDONTWANT message, we will not have this peer-id in the connected mapping.

Am currently still searching for the others....

@AgeManning AgeManning requested a review from jxs August 8, 2024 05:17
@AgeManning AgeManning marked this pull request as ready for review August 8, 2024 07:36
@AgeManning AgeManning added the ready-for-review The code is ready for review label Aug 8, 2024
@AgeManning
Copy link
Member Author

Ok, found the other bug. This should fix the libp2p issues I think

beacon_node/lighthouse_network/gossipsub/src/behaviour.rs Outdated Show resolved Hide resolved
@@ -2706,8 +2712,8 @@ where

for peer_id in recipient_peers {
let Some(peer) = self.connected_peers.get_mut(peer_id) else {
tracing::error!(peer = %peer_id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of removing this error, can we instead filter above for the gossip promises? I.e.

// Gossip promises are kept for disconnected peers.
let iwant_peers = self
            .gossip_promises
            .peers_for_message(msg_id)
            .iter()
            .filter(|peer_id| self.connected_peers.contains_key(peer_id));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this. I don't mind.
The logic that went through my head was:
Adding this check adds a hashmap lookup for each peer, but a few lines down the page we do the exact same check.

I figured we are already doing the check, by just removing the error, we have no extra hash lookup, however the cost is we don't log an error in case a mesh peer isn't in the connected mapping (but we have these errors elsewhere).

Its such a tiny difference, so happy to do whatever you think is best, just explaining why I removed the error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for elaborating age! Do you know where else we check? It def doesn't make sense to double check this adding another iteration as you refer, I just think we shouldn't stop logging for this event

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yeah happy to do what you prefer.

I can't think of a way to keep the error without adding an extra hashmap lookup. Its a trivial lookup, and maybe the error is worth it. So I'm happy to go with the soln you have suggested here if you want the error.

Co-authored-by: João Oliveira <hello@jxs.pt>
Copy link
Member

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw age, can we also add an entry to the changelog?

@@ -2706,8 +2712,8 @@ where

for peer_id in recipient_peers {
let Some(peer) = self.connected_peers.get_mut(peer_id) else {
tracing::error!(peer = %peer_id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for elaborating age! Do you know where else we check? It def doesn't make sense to double check this adding another iteration as you refer, I just think we shouldn't stop logging for this event

@AgeManning AgeManning force-pushed the supress-invalid-gossipsub-error branch from 86fa0b4 to 47039ef Compare September 3, 2024 04:35
@AgeManning
Copy link
Member Author

I've added the changelog :). Feel free to add the error back if you like.

@AgeManning AgeManning requested a review from jxs September 8, 2024 23:47
@AgeManning
Copy link
Member Author

@jxs - I didnt realise we haven't merged this. We probably want this in

@AgeManning
Copy link
Member Author

@Mergifyio queue

Copy link

mergify bot commented Oct 28, 2024

queue

🛑 The pull request has been removed from the queue default

The merge conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

Copy link
Member

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

mergify bot added a commit that referenced this pull request Oct 29, 2024
@AgeManning
Copy link
Member Author

@Mergifyio queue

Copy link

mergify bot commented Oct 29, 2024

queue

🛑 The pull request has been removed from the queue default

The merge conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

@AgeManning AgeManning added ready-for-merge This PR is ready to merge. v6.0.0 New major release for hierarchical state diffs and removed ready-for-review The code is ready for review labels Oct 29, 2024
@AgeManning
Copy link
Member Author

@Mergifyio refresh

Copy link

mergify bot commented Oct 29, 2024

refresh

✅ Pull request refreshed

@AgeManning
Copy link
Member Author

https://github.com/Mergifyio refresh

Copy link

mergify bot commented Oct 29, 2024

refresh

✅ Pull request refreshed

@AgeManning
Copy link
Member Author

@Mergifyio queue

Copy link

mergify bot commented Oct 30, 2024

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at 8d7b3dd

@mergify mergify bot merged commit 8d7b3dd into sigp:unstable Oct 30, 2024
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge This PR is ready to merge. v6.0.0 New major release for hierarchical state diffs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants