Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Shard DHT #3104

Open
wants to merge 10 commits into
base: v6/develop
Choose a base branch
from

Conversation

zeroxbt
Copy link
Contributor

@zeroxbt zeroxbt commented Mar 12, 2024

Description

This PR replaces the libp2p-kad-dht dependency with a modified version, which attempts to simplify the peer discovery mechanism and make it more in line with the sharding table implementation.

The problem

With the current network implementation, peer discovery is done through the libp2p-kad-dht module. Every time a node connects to our node, it is added to the routing table and the peer information is added to the peer store. This means that the peer routing table contains nodes that are potentially not in the node's sharding table, because of different blockchain implementations, or just because they don't have the required stake.
This is problematic for a number of reasons:

  1. On "findPeer()", the node will potentially get peers multiaddresses from nodes that don't have stake in the network. This increases chance of stale/false peer information being replicated.
  2. The routing table might contain peers that are of no interest to our node. For example, a node on gnosis:100 doesn't logically want nodes from otp:2043 in its routing table.
  3. On "findPeer()", nodes will respond with the searched node information, or with a list of closer peers if not found. This list of closer peers might again contain nodes that we don't necessarily care about.
  4. On "peer:connect" events, the node will try to update sharding table information even if the peer is not present in the sharding table.

How it is solved

The libp2p-kad-dht module was copied and moved to the network module, so that we can make changes to it (i left it as separate package for simplicity, but can be change obviously). All unnecessary functions and dependencies were removed, and the routing table was changed from KBucket list to a simple Map of peer ids with their respective blockchain implementations. The routing table is now updated same way as for the local sharding table, so on NodeAdded and NodeRemoved events.
At start, the bootstrap nodes will be added to the Shard DHT, and all nodes from the sharding table will be added to the Shard DHT routing table.

  1. On "findPeer()", the node will filter out unwanted peers from the closer nodes list
  2. Nodes will only be added to the routing table following the same logic as for the sharding table
  3. On "findPeer()", nodes will respond with a list of closer peers that have same blockchain implementations as the requester node.
  4. On "peer:connect" event, the node will only update last seen and last dialed values in the sharding table if the peer is in the sharding table.

How to review

As this change involved the copying of the whole libp2p-kad-dht module there are a lot of new files and it's hard to review on github, so I suggest looking at individual commits after "solving linter errors".

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Run nodes in local environment, and ensure they are connecting to each other as expected. All nodes with same blockchain implementation should be able to discover and connect to each other, while nodes with different implementations should not.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

@zeroxbt
Copy link
Contributor Author

zeroxbt commented Mar 12, 2024

There is an alternative solution to this which is to create a new libp2p protocol for peer routing, as we did for content routing (GET protocol) and completely remove the DHT from the network module. This would definetly be a better option as we would have better separation of concerns (netork module and blockchain module completely separate), and would not need to have the routing table in addition to the sharding table.
The issue with this approach is that it wouldn't be backwards compatible, and nodes would still have to run the DHT to respond to FIND_NODE messages from nodes on older versions, which completely defeats the purpose.
@djordjekovac please review with the team. I'll gladly implement the new route if you find a way to make it backwards compatible

@djordjekovac
Copy link
Contributor

djordjekovac commented Mar 25, 2024

Hi @zeroxbt, i'm not sure that i understand need for this. Also we want to allow nodes that don't have stake to perform get on the network (light nodes), are these changes going to affect this?

  1. On "findPeer()", the node will potentially get peers multiaddresses from nodes that don't have stake in the network. This increases chance of stale/false peer information being replicated.

When searching for the neighbourhood for the publish or get we are only considering nodes that are in sharding table (nodes that have stake). So i don't think that we will have this issue.

  1. The routing table might contain peers that are of no interest to our node. For example, a node on gnosis:100 doesn't logically want nodes from otp:2043 in its routing table.
  2. On "findPeer()", nodes will respond with the searched node information, or with a list of closer peers if not found. This list of closer peers might again contain nodes that we don't necessarily care about.

You are right for this. But also i think that network module should not be aware of blockchains that node is connected to. They should work independently and network should be in charge of sending and receiving message, and then somewhere on different layer we should do some processing depending on blockchain.

  1. On "peer:connect" events, the node will try to update sharding table information even if the peer is not present in the sharding table.

Also this is true and it would be good that we don't do this.

Thanks for sending this PR, i will try to reply faster in the future :-)

@zeroxbt
Copy link
Contributor Author

zeroxbt commented Mar 25, 2024

Hi @zeroxbt, i'm not sure that i understand need for this. Also we want to allow nodes that don't have stake to perform get on the network (light nodes), are these changes going to affect this?

  1. On "findPeer()", the node will potentially get peers multiaddresses from nodes that don't have stake in the network. This increases chance of stale/false peer information being replicated.

When searching for the neighbourhood for the publish or get we are only considering nodes that are in sharding table (nodes that have stake). So i don't think that we will have this issue.

  1. The routing table might contain peers that are of no interest to our node. For example, a node on gnosis:100 doesn't logically want nodes from otp:2043 in its routing table.
  2. On "findPeer()", nodes will respond with the searched node information, or with a list of closer peers if not found. This list of closer peers might again contain nodes that we don't necessarily care about.

You are right for this. But also i think that network module should not be aware of blockchains that node is connected to. They should work independently and network should be in charge of sending and receiving message, and then somewhere on different layer we should do some processing depending on blockchain.

  1. On "peer:connect" events, the node will try to update sharding table information even if the peer is not present in the sharding table.

Also this is true and it would be good that we don't do this.

Thanks for sending this PR, i will try to reply faster in the future :-)

Hey @djordjekovac, light nodes still work, as they will be able to find all nodes in the sharding table. What changes is that light nodes are not used to "spread" information about other nodes on the network anymore.

For your second question, it is true that we use sharding table for the routing of content ("publish", "get" and "update"). The problem here is that for the routing of peers we still use ("findPeer"), which uses the KadDHT and not the sharding table. This means that the node asks for peers information like IP, protocols etc. to all nodes in the DHT. The main problem, which I am attempting to solve with this PR, is that we should only rely on sharding table nodes for peers information, same as we do for content.

For the third comment I completely agree. I left a comment regarding this. There is a way to do peer routing using a new protocol route, so we could remove the DHT completely by exchanging peer information through a new "get" route, which would mean only querying nodes in the sharding table. The only problem with this is that it is not backwards compatible: if we remove DHT from new nodes, nodes on older versions won't be able to find information about nodes on newer versions, and vice versa.

@zeroxbt
Copy link
Contributor Author

zeroxbt commented Mar 25, 2024

If this is not clear I can give a practical example:

Sharding table nodes for NeuroWeb: [A, B, C]
Sharding table nodes for Gnosis: [B, C, D, E, F, G]
Light nodes for Gnosis: [H, I]

-> Node A joins the network
-> Node A will connect to bootstrap and ask "give me information about all nodes which are closest to me" (done by libp2p)
-> Bootstrap sends closest nodes to node A, for ex: [B, D, E, H]
-> Node A asks peer information about node C to its closest nodes, for ex: [D, E, H]
-> Node H sends information to A about node C
-> Node A stored information about node C

You can see from this example that right now, peer information is exchanged between all nodes on the network, independently from the chosen blockchain or if node is in sharding table or not. This is not great, as we don't really want to trust node H (or any other node that is not present in our sharding table) to give us peer information.
Same as for content routing, it would be best if peer routing is restricted to sharding table nodes.

I hope it's clearer now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants