-
Notifications
You must be signed in to change notification settings - Fork 741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct gossipsub mesh and connected peer inconsistencies #6244
Correct gossipsub mesh and connected peer inconsistencies #6244
Conversation
Ok, found the other bug. This should fix the libp2p issues I think |
@@ -2706,8 +2712,8 @@ where | |||
|
|||
for peer_id in recipient_peers { | |||
let Some(peer) = self.connected_peers.get_mut(peer_id) else { | |||
tracing::error!(peer = %peer_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of removing this error, can we instead filter above for the gossip promises? I.e.
// Gossip promises are kept for disconnected peers.
let iwant_peers = self
.gossip_promises
.peers_for_message(msg_id)
.iter()
.filter(|peer_id| self.connected_peers.contains_key(peer_id));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this. I don't mind.
The logic that went through my head was:
Adding this check adds a hashmap lookup for each peer, but a few lines down the page we do the exact same check.
I figured we are already doing the check, by just removing the error, we have no extra hash lookup, however the cost is we don't log an error in case a mesh peer isn't in the connected mapping (but we have these errors elsewhere).
Its such a tiny difference, so happy to do whatever you think is best, just explaining why I removed the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for elaborating age! Do you know where else we check? It def doesn't make sense to double check this adding another iteration as you refer, I just think we shouldn't stop logging for this event
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So yeah happy to do what you prefer.
I can't think of a way to keep the error without adding an extra hashmap lookup. Its a trivial lookup, and maybe the error is worth it. So I'm happy to go with the soln you have suggested here if you want the error.
Co-authored-by: João Oliveira <hello@jxs.pt>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw age, can we also add an entry to the changelog?
@@ -2706,8 +2712,8 @@ where | |||
|
|||
for peer_id in recipient_peers { | |||
let Some(peer) = self.connected_peers.get_mut(peer_id) else { | |||
tracing::error!(peer = %peer_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for elaborating age! Do you know where else we check? It def doesn't make sense to double check this adding another iteration as you refer, I just think we shouldn't stop logging for this event
86fa0b4
to
47039ef
Compare
I've added the changelog :). Feel free to add the error back if you like. |
@jxs - I didnt realise we haven't merged this. We probably want this in |
@Mergifyio queue |
🛑 The pull request has been removed from the queue
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@Mergifyio queue |
🛑 The pull request has been removed from the queue
|
@Mergifyio refresh |
✅ Pull request refreshed |
https://github.com/Mergifyio refresh |
✅ Pull request refreshed |
@Mergifyio queue |
✅ The pull request has been merged automaticallyThe pull request has been merged automatically at 8d7b3dd |
Issue Addressed
We are seeing errors in gossipsub related to an inconsistent state between mesh peers and connected peers.
There appears to be a few problems. This PR will be updated as we find all the bugs.
The first I have noticed is that the IDONTWANT messages also look for in-flight gossipsub promises. When a peer disconnects we dont remove the tracking of in-flight gossipsub promises, because it is inefficient search (it's not index'd by peer-id). Therefore we can have disconnected peers in this mapping and if we go to send an IDONTWANT message, we will not have this peer-id in the connected mapping.
Am currently still searching for the others....