A pithy primer of `Polykey` #777

tegefaulkes · 2024-07-25T01:04:10Z

tegefaulkes
Jul 25, 2024
Maintainer

Polykey is a kind of a complex program. It is made up groups of functionality we call domains These are a collection of classes, utilities and types, some are made up of a single class.

For example, we have a domain called keys. This handles the base level password, keys and certificate information that the rest of Polykey makes use of. It contains 2 main classes, the CertManager and the KeyRing. What they each do is self explanatory and I'll leave it to you to dig deeper.

Keep in mind that for the most part each of these classes are decorated with an async-init giving the class a life-cycle. So the KeyRing is a CreateDestroyStartStop structure. Which just means that it has implemented a CreateKeyRing static factory function for creating the class. And contains a start, stop and destroy method. Any method inside KeyRing that is decorated with @ready can only be called when the KeyRing is in the running state. Calling stop will cause it to stop and active state and calling destroy will clear any persistent state. That's all you really need to know for now. You will see this pattern everywhere in Polykey and even other projects.

Polykey can be subdivided into a few aspects.

Cryptography features for handling keys, passwords, certificates and claims. Handled by the keys, claims and sigChain domains
Networking code for making and handling connections between nodes, Handled by the nodes and discovery domains.
Tracking relationships and permissions, handled by the gestalts, sigchain and ACL domains.
Storing encrypted secrets, handled by the vaults domain.
Client level control and communication, handled via the client domain.

There are some others but those are the main ones. If you look into

Polykey/src/PolykeyAgent.ts

Line 244 in 2a88416

try {

to see how they are all created and depend on each other. Look inside the CreatePolykeyAgent method inside the main try {} catch block.

Arguably the most important domain for Polykey as a product is the vaults domain. It's where all of the secret handling is and maybe one of the easier domains to read through. For that you'll want to start at the VaultManager https://github.com/MatrixAI/Polykey/blob/staging/src/vaults/VaultManager.ts. Get a feel for that and the dig down into VaultInternal and how that handles the data. Keep an eye out for the withF pattern. You'll see that all over the place.

After that the nodes domain is the next important. It handles all of the logic for managing connections in the NodeConnectionManager. Notably tracking data about other nodes in the NodeGraph and handling all of the logic for finding nodes you want to connect to and establishing that connection in the NodeManager. All of the connection related stuff is pretty complex so just skim over this. Dig into how the NodeConnectionManager https://github.com/MatrixAI/Polykey/blob/staging/src/nodes/NodeConnectionManager.ts handles creating connections and how the NodeConnection https://github.com/MatrixAI/Polykey/blob/staging/src/nodes/NodeConnection.ts wraps them. It uses an object map and locking to ensure that we don't create duplicate NodeConnections. Skim over what NodeManager handles, especially the logic for finding nodes and NodeConnectionQueue, its pretty complex. Besides that, some good related reading is the kademlia spec https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf. Get an idea of how closeness works as a concept in Kademlia.

Polykey uses a lot of persistent state, this will be stored in the DB as encrypted data. So to get a good feel of what domains need data persistence just trace what depends on the DB. Along side this the keys domain maintains most of the information used for they cryptography functions. It stores the private and public keys. But also manages encryption keys for the DB and vaults domain.

Polykey needs to track and maintain relationships and permissions between nodes. There are a few parts to this. One of the main ones are claims. These in essences are a claim that are signed by one or more nodes. usually to state that two nodes own each other in a way that forms a gestalt. Other claims will be made and they can come later. These are stored in the SigChain which functions similar to a block chain where immutability of the chain is enforced by a claim including a hash of it's parent within it. So the history can't be modified without breaking the chain.

The ACL tracks permissions we give to other nodes. The main example of this is the permission to see and clone vaults from our node. It is a simple access control list that maps a permission to a NodeID

The Gestalts domain manages keeping track of claims between nodes within a gestalt. A Gestalt is a graph formed of all nodes that hold claims between each other. So a collection of nodes can be considered a group of a whole. given how the structure works, you can't really reference a gestalt as a whole, you can only really reference it by a member or check if two nodes are part of a gestalt. The GestaltGraph tracks our own gestalt but also other nodes gestalts. For the most part we only care about our own, or gestalts we have a first order relationship with via social or permission links.

Following on from gestalts we have the discovery domain. This handles the logic for exploring and mapping these claims and links between nodes. It fills in the GestaltGraph with this information and makes sure it's maintained.

Have a read over this @aryanjassal @brynblack if you have any questions just ask them in the comments here. They'll be a great reference for later.

2024-07-25T01:04:13Z

linear[bot]
bot Jul 25, 2024

ENG-369 A pithy primer of `Polykey`

0 replies

tegefaulkes · 2024-07-25T01:05:32Z

tegefaulkes
Jul 25, 2024
Maintainer Author

This might function better as a discussion. Maybe I'll convert it.

0 replies

CMCDragonkai · 2024-07-25T08:03:26Z

CMCDragonkai
Jul 25, 2024
Maintainer

Comments please @aryanjassal @brynblack.

0 replies

aryanjassal · 2024-07-30T00:29:47Z

aryanjassal
Jul 30, 2024
Maintainer

Following on from gestalts we have the discovery domain. This handles the logic for exploring and mapping these claims and links between nodes. It fills in the GestaltGraph with this information and makes sure it's maintained.

I'm unable to understand how the discovery domain works. How does it fill in the GestaltGraph? With what information? What is a "maintained" state for the GestaltGraph and how does the discovery domain ensure that the "maintained" state is achieved?

6 replies

aryanjassal Jul 30, 2024
Maintainer

That's what the discovery domain is for. It will walk this graph, similar to a crawler, and ask each node for their sigchain. It then extracts the claim information, updates the gestalts domain with what it discovered and then adds the linked nodes found in the claims to a queue to be discovered later.

Which queue are we adding the discovered linked nodes to? If we have already discovered the node, then why are we adding it to a queue to discover later? Is it to ensure that the GestaltGraph remains updated by polling the graph at intervals?

Any other gestalts don't matter all that much so we don't need to explore them. In fact, if we discovered all gestalts then we'd have to hold information about the whole network within a node. As you can imagine this won't scale well.

So all the information of a gestalt the current node is a part of, a gestalt we share permissions and trust with, and gestalts which we have a social link with, are all stored within a single node. Won't that not scale well even with a large network? Say Microsoft has begun using Polykey. With the scale of the corporation, there would be thousands, maybe hundreds of thousands of users. Won't the performance or scalability of the network see a massive hit?

Also, I am aware how a node can be a part of a gestalt, or a network of trusted nodes. This is similar to priority no. 2 from what I can understand. However, what does having a "social link" with another node mean? How is it different to trusting them, thus introducing that node within a gestalt network?

The gestalt the current node is apart of.

The wording here implies that a node can only be a part of a single gestalt. If a gestalt refers to a graph of nodes that hold claim to each other, why can't a node be a part of multiple gestalts? If they are a part of multiple gestalts, then won't discovering the multiple gestalts once again put more load on the singular node meant for storing the graph structure from the discovery process?

tegefaulkes Jul 30, 2024
Maintainer Author

Which queue are we adding the discovered linked nodes to? - Discovery domain works through a priority queue to explore a graph of nodes. It's implementation detail. And yes, we re-discover nodes after a while to check for updates.
So all the information of a gestalt the current node is a part of,... - A node Only really cares about nodes it directly interacts with. Unless an org uses a single node to distribute secrets to everything which won't scale. You won't see the gestalt graph hitting scaling issues.
Also, I am aware how a node can be a part of a gestalt, or a... - Social links are formed between identities in a gestalt. For example an github user is an identity. It can have followers or follows so we want to be aware if any of them have their own Polykey nodes and gestalts. But in this case, we care less about knowing the entire gestalt unless the user applies some kind of trust to it.
The wording here implies that a node can only be a part of a single gestalt. ... - By definition a Gestalt is a fully connected graph nodes with claims between them. If a claim was made between two nodes in two separate gestalts. Then by definition the two gestalts would merge into one. That said, in future, there can be different classes of claims such as a network access claim. That can also be considered a distinct gestalt that forms the entire private network of nodes. And this network gestalt and be separate to the identity gestalt we currently have. But right now the only gestalts we have are the identity gestalts as I've described.

aryanjassal Jul 30, 2024
Maintainer

It will keep doing this in the background to discover entire gestalts and keep their details updated. In order of importance it discovers the following.

The gestalt the current node is apart of.

Gestalts of any nodes we share permissions and trust with.

Gestalts of any nodes we have social links with.

I now know that a social link is merely (in this example) Github followers and follows being acknowledged by the discovery system and knowing if they have their gestalt or not. So, if one of our followers are in a different gestalt to us, and we trust their node, will their gestalt merge with our gestalt, correct? If this is the case, won't this place a large amount of load on the singular network responsible for storing all the information within a gestalt given how easily a gestalt expands? Won't the network eventually have nodes from different gestalts trusting each other, leading to their gestalts being merged constantly until there is only one gestalt as large as the public network, leading to the scenario where a singular node contains the information of all the nodes present in the entire network?

Also, how will gestalt work with private networks like the ones in an enterprise? One node might need to trust an external node for whatever reason, but does that mean that the enterprise network is now a part of the entire public gestalt (if it was isolated before)?

The first priority to scan by the discovery crawler is the gestalt our node is a part of. The second priority scans any nodes we share permissions/trust with. Won't that automatically unify the gestalt of the target node with the gestalt of our node?

tegefaulkes Jul 30, 2024
Maintainer Author

You're confusing claims and trust. Claims are explicit claims that two nodes own each other which links then within a gestalt. Trust just means you gave a permission to a node. In the case of trust we want to be aware of their gestalt but it doesn't merge the gestalts together.

We'll have to consider how gestalts interact with private networks. But that's a design problem for later and out of scope for this discussion.

It's great that you're diving deeper into specific parts but lets try to stick to a higher level overview of the systems for now.

aryanjassal Jul 30, 2024
Maintainer

You're confusing claims and trust. Claims are explicit claims that two nodes own each other which links then within a gestalt. Trust just means you gave a permission to a node. In the case of trust we want to be aware of their gestalt but it doesn't merge the gestalts together.

Okay, so a gestalt is a network of claims. Say that I have claimed a node using my enterprise network and my public network. Then, my gestalt will have both the nodes. Trust, on the other hand, basically gives permissions to interact with only a singular node within a given gestalt, and trusting another node will not affect the gestalt's structure, but claiming another node will. That makes sense.

aryanjassal · 2024-07-30T00:54:02Z

aryanjassal
Jul 30, 2024
Maintainer

I've taken a look at the Kademlia paper and even asked ChatGPT to explain it to me, but I am struggling to grasp how the distance metric works in relation to a binary tree.

In this image, the physical distance from 0000 to 0001 is the same as from 0000 to 0010 (being one node away) but why does the distance between 0000 and 0001 equal 1 and the distance between 0000 and 0010 equal 2? I understand how XOR operation works, but how does this XOR operation relate to the "closeness" of two nodes if it gives differing results for two equidistant nodes?

Similarly, 0100 and 1000 are also one node away, and still, how is the distance between them 12?

9 replies

tegefaulkes Jul 30, 2024
Maintainer Author

You're getting it now. Though the relations between connections and nodes within a bucket are a little more vague than that. This is all data about the network rather than hard connections. But yeah for the sake of example we can assume that if a node is in the graph then it can make a connection to it.

As for how nodes find each other within a network using this data structure. Read up on the node find operation within the kademlia spec. But it boils down to.

Check for target node within my buckets. If found then done otherwise get the 20 closest nodes to the target and add it to a priority queue based on closeness.
Pop the closest node in the queue and ask it for it's 20 closest nodes to the target. Add these nodes to the Priority queue, if target found then end.
Continue asking until either the queue is exhausted, 20 nodes have been asked, or the target is found.

Basically, if you can't find your target, you ask a node that is closer to it because they're more likely to have the node, or know a node that's closer still. Imagine a k* walk over a graph getting closer to the target each step.

aryanjassal Jul 30, 2024
Maintainer

So, instead of brute-forcing by searching each node's connections, we instead sort the connected nodes by closeness (using the proposed XOR metric) and then, for each node within that sorted queue, we sort all their connections by closeness and pop the 20 closest nodes. We keep repeating these steps for each node until either the queue has been exhausted (all connected nodes to the first 20 nodes have been explored) or the target has been found. Is my understanding correct?

How does this solve the problem of having too many nodes to look through? There should be a timeout and, from my knowledge, there is one implemented already. Also, if the target node is reachable by the 21st closest node, then won't the network never be able to find the target node? As such, how does the network deal with nodes that are unreachable either due to timeout or due to not being in the connections of the first 20 nodes?

Also, what is a K* walk? I could not find any resources relating to K* walk. Did you mean A* walk?

tegefaulkes Jul 30, 2024
Maintainer Author

My bad, I misremembered, its the A* algorithm.

Yeah, you pretty much have it down.

As for scaleability. You're not naively searching the entire network. You're searching it in an informed way that reduces the amount of useless work you need to do. Since we're walking closer each search step. With how the buckets are structured and the closeness network, you should only need to take a number of steps that are logarithmic to the network size. This means the process is extremely scaleable. You should be able to find the target in much less than 20 steps.

See https://en.wikipedia.org/wiki/Six_degrees_of_separation The idea is sometimes generalized to the average 'social distance' being 'logarithmic' in the size of the population.

aryanjassal Jul 30, 2024
Maintainer

With how the buckets are structured and the closeness network, you should only need to take a number of steps that are logarithmic to the network size.

Taking the natural log of the potential maximum size of the network (2^256), we get approximately 180. Meaning, approximately we will need to traverse a maximum of 180 steps.

See https://en.wikipedia.org/wiki/Six_degrees_of_separation The idea is sometimes generalized to the average 'social distance' being 'logarithmic' in the size of the population.

This also assumes that there is no cap to the number of connected nodes. One actor can know only one other actor, or can know 90% of all actors. This is different to Polykey, where each node can only know up to 20 nodes from each bucket (or a total of 5000 nodes). Moreover, this idea applies to the average "distance".

I actually implemented a program which takes in a large list of actors and the movies they starred in. While I couldn't find any connection exceeding 6 stages of separation, I was also analysing the entire list of actors connected to our actor, which is not the case in Polykey as we only select the top 20 nodes based on closeness.

How do we handle worse cases where either the required steps are greater than the logarithm of the network size (180) or are hidden deeper than the reach of 20 nodes?

tegefaulkes Jul 30, 2024
Maintainer Author

You're making some bad assumptions there. If each step we only moved down 1 bucket index of closeness log2 then it would be 255 steps. We're not doing that, were getting the closest nodes of each step witch will jump a whole bunch of buckets each step.

This also assumes that there is no cap to the number of connected nodes No it doesn't, it assumes reasonably sized friend groups which is usually much lower than 5000 known people.

So yeah, the worst case this wouldn't work. But we're not operating in the worst case, were working on the average case here. We could have an extremely large and uneven network. We could conceivably have for fully address space utilised and connected in one very long linear chain. You could never search a network like that regardless of algorithm so we just don't consider it because we don't construct a network like that.

We could go deeper into the algorithm here but we're too deep into the weeds at this point. This is extra reading and not required for understanding the code base.

CMCDragonkai · 2024-07-30T03:02:06Z

CMCDragonkai
Jul 30, 2024
Maintainer

The discovery domain requires a bit of improvement. It's supposed to act as a decentralized crawler. Lots of optimisation required here... as well as wasm based plugins to deal with various third party platforms.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A pithy primer of `Polykey` #777

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 15 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

A pithy primer of Polykey #777

tegefaulkes Jul 25, 2024 Maintainer

Replies: 6 comments · 15 replies

linear[bot] bot Jul 25, 2024

tegefaulkes Jul 25, 2024 Maintainer Author

CMCDragonkai Jul 25, 2024 Maintainer

aryanjassal Jul 30, 2024 Maintainer

aryanjassal Jul 30, 2024 Maintainer

tegefaulkes Jul 30, 2024 Maintainer Author

aryanjassal Jul 30, 2024 Maintainer

tegefaulkes Jul 30, 2024 Maintainer Author

aryanjassal Jul 30, 2024 Maintainer

aryanjassal Jul 30, 2024 Maintainer

tegefaulkes Jul 30, 2024 Maintainer Author

aryanjassal Jul 30, 2024 Maintainer

tegefaulkes Jul 30, 2024 Maintainer Author

aryanjassal Jul 30, 2024 Maintainer

tegefaulkes Jul 30, 2024 Maintainer Author

CMCDragonkai Jul 30, 2024 Maintainer

A pithy primer of `Polykey` #777

tegefaulkes
Jul 25, 2024
Maintainer

Replies: 6 comments 15 replies

linear[bot]
bot Jul 25, 2024

tegefaulkes
Jul 25, 2024
Maintainer Author

CMCDragonkai
Jul 25, 2024
Maintainer

aryanjassal
Jul 30, 2024
Maintainer

aryanjassal Jul 30, 2024
Maintainer

tegefaulkes Jul 30, 2024
Maintainer Author

aryanjassal Jul 30, 2024
Maintainer

tegefaulkes Jul 30, 2024
Maintainer Author

aryanjassal Jul 30, 2024
Maintainer

aryanjassal
Jul 30, 2024
Maintainer

tegefaulkes Jul 30, 2024
Maintainer Author

aryanjassal Jul 30, 2024
Maintainer

tegefaulkes Jul 30, 2024
Maintainer Author

aryanjassal Jul 30, 2024
Maintainer

tegefaulkes Jul 30, 2024
Maintainer Author

CMCDragonkai
Jul 30, 2024
Maintainer