Add designated node id to contenttree #65

zedi-pramodh · 2024-08-23T18:17:56Z

Add designated node id to content tree
This node id will be used to decide if the content tree content need to be download or not
on that particular edge node. On a clustered eve devices this value can be used to determine which node is reponsible
to download the content. This is very similar to the Volume message where we use designated node id to determine the
node responsible to create the Volume.

On a single node eve installations assumption is that controller will pass in the non nil value,
but the code should be able to handle the nil values gracefully.

This node id will be used to decide if the content tree content need to be download or not on that particular edge node. On a clustered eve devices this value can be used to determine which node is reponsible to download the content. This is very similar to the Volume message where we use designated node id to determine the node responsible to create the Volume. On a single node eve installations assumption is that controller will pass in the non nil value, but the code should be able to handle the nil values gracefully. Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>

Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>

naiming-zededa

LGTM

deitch · 2024-08-27T19:02:56Z

I'm somewhat confused by this. A content tree is just a reference to content. When we send it to a node, we are saying, you want this (well, maybe connected to a volume). Doesn't it muddle the content resource if we add a node to it?

If we want to say, "here is a content tree, but don't download it", shouldn't we not send it to that node? Isn't that the job of the controller, to determine who will get the message, "download this tree"?

Conversely, if we really have to send it to multiple nodes - it would help to understand why the controller would send it to nodes A, B, C if only B will download it - then I think a separate structure would be the right place. "This is a content tree" should look identical everywhere. "This is a list of nodes that should have content tree X", where X is the reference to the aforementioned content tree, is distinct.

zedi-pramodh · 2024-08-27T19:56:21Z

This is very similar to Volumes struct which already has designated node id. Consider nodes A, B and C. Where the volume creation happens on A and is replicated to B and C. The idea is that for a replicated volume, longhorn is already creating a replica PVC on the nodes B and C, there is absolutely no need to download the content of the volume to nodes B and C. Having this designated node id tells zedmanager and zedagent to not worry of content tree config/status as long as Volume status can be fetched to start an app (trigger domainmgr). If we don't do this way, then we need to come up with a more complex mechanism of cluster wide pubsub or make big changes in eve micro services. This approach simplifies the code on remote nodes. Now why cloud has to send config to all nodes ? Its mostly for handling failover cases, when an app failover, kubernetes will start the app but domainmgr needs to report back to cloud, domainmgr only looks for app if it is in the config.

eriknordmark

LGTM

As we go forward we should document a bit more how this designated node id and other cluster concepts work - maybe start on a docs/EDGE-NODE-CLUSTERS.md and cover the cluster formation, recovery/replacement, and updates. Does Andrew already have some things written up to start such a document?

zedi-pramodh · 2024-08-27T20:49:32Z

I believe Andrew has multiple docs, we need to formalize into one doc and publish as docs/EDGE-NODE-CLUSTERS.md.
We will work on that.

deitch · 2024-08-28T07:09:07Z

This is very similar to Volumes struct which already has designated node id.

My understanding (correct me) is that a volume is a specific thing on a specific node. It lives only in one place, like a VM or a container, while a content tree is like a VM or OCI image, it is referenced in many places, is identical, can be instantiated in many places into a VM or container; once it does, then its local instantiation (i.e. volume) is unique. If you like, class vs object (I never really loved OOP, but the ideas are useful).

Consider nodes A, B and C. Where the volume creation happens on A and is replicated to B and C. The idea is that for a replicated volume, longhorn is already creating a replica PVC on the nodes B and C, there is absolutely no need to download the content of the volume to nodes B and C.

I know you are right about this behaviour. The question is how to configure it. Sending a node a content tree in its config does not mean, "download this content tree", it means, "make sure this content tree exists on your node, I do not particularly care how." Until now, there was only one way, i.e. download it, so volumemgr (I think?) implicitly converted "here is a content tree you must have" to "here is a content tree to download". You are saying, there now are multiple ways of satisfying the requirement, "have this content tree available". If the implicit assumption of "here is a tree = download it" no longer holds, volumemgr should understand other ways of getting volumes.

In the end: zed* receives a config with content tree and volumes, passes them to volumemgr; when volumemgr is done, those trees and volumes exist. How it does so is transparent to everyone except volumemgr. Any changes to zed* or outside services makes the whole structure more complex, and creates a lot of possibilities for breaking things.

Now why cloud has to send config to all nodes ? Its mostly for handling failover cases, when an app failover, kubernetes will start the app but domainmgr needs to report back to cloud, domainmgr only looks for app if it is in the config.

Controller always sends config to all nodes, otherwise a node cannot function. And it always should tell it, "here are all the things you need to know to function." Did I misunderstand the point you are making?

zedi-pramodh · 2024-08-28T17:51:51Z

This is very similar to Volumes struct which already has designated node id.

My understanding (correct me) is that a volume is a specific thing on a specific node. It lives only in one place, like a VM or a container, while a content tree is like a VM or OCI image, it is referenced in many places, is identical, can be instantiated in many places into a VM or container; once it does, then its local instantiation (i.e. volume) is unique. If you like, class vs object (I never really loved OOP, but the ideas are useful).

Consider nodes A, B and C. Where the volume creation happens on A and is replicated to B and C. The idea is that for a replicated volume, longhorn is already creating a replica PVC on the nodes B and C, there is absolutely no need to download the content of the volume to nodes B and C.

I know you are right about this behaviour. The question is how to configure it. Sending a node a content tree in its config does not mean, "download this content tree", it means, "make sure this content tree exists on your node, I do not particularly care how." Until now, there was only one way, i.e. download it, so volumemgr (I think?) implicitly converted "here is a content tree you must have" to "here is a content tree to download". You are saying, there now are multiple ways of satisfying the requirement, "have this content tree available". If the implicit assumption of "here is a tree = download it" no longer holds, volumemgr should understand other ways of getting volumes.

In the end: zed* receives a config with content tree and volumes, passes them to volumemgr; when volumemgr is done, those trees and volumes exist. How it does so is transparent to everyone except volumemgr. Any changes to zed* or outside services makes the whole structure more complex, and creates a lot of possibilities for breaking things.

The problem is that on the remote nodes we do not need to download the content at all since the content is already converted to volumes and shows up as a PVC. For a native container OCI image we will download the content on remote nodes too. So we need a way to tell if the content downloads needs to be ignored on that node.

Now why cloud has to send config to all nodes ? It's mostly for handling failover cases, when an app failover, kubernetes will start the app but domainmgr needs to report back to cloud, domainmgr only looks for app if it is in the config.

Controller always sends config to all nodes, otherwise a node cannot function. And it always should tell it, "here are all the things you need to know to function." Did I misunderstand the point you are making?

What I am trying to say is controller will send app config to all nodes in a cluster even though that app is not supposed to run on that node. So we need a way to make sure that particular node is not downloading content unnecessarily.

Need to bump eve api due to PR lf-edge/eve-api#65 Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>

deitch · 2024-08-29T06:09:26Z

The problem is that on the remote nodes we do not need to download the content at all since the content is already converted to volumes and shows up as a PVC. For a native container OCI image we will download the content on remote nodes too. So we need a way to tell if the content downloads needs to be ignored on that node

@zedi-pramodh totally agree about the behaviour. I am discussing where we flag, "this content tree can be made available via method A (download) on this node, and method B (volume replication) or C on that node." My point is, content tree is an immutable thing, and is identical. Inclusion of a content tree by config means, "be sure this content tree is available". It does not mean, "download this content tree". So if we want to tell a node how to get that tree, that is something other than a property of the tree (probably property of the Volume).

What I am trying to say is controller will send app config to all nodes in a cluster even though that app is not supposed to run on that node. So we need a way to make sure that particular node is not downloading content unnecessarily

Ah, ok, that makes sense. Brings us back to the previous point.

This is sort of like saying, I will want to run a container made from image alpine:3.20 on 3 nodes. Node A should download it then sync to B and C. I won't modify the tag or manifest of alpine:3.20 to tell A to download and B,C to sync; I will have the same reference to alpine:3.20 on all 3 nodes, and have a separate instruction set that says, "here is how to get it", which in our case is (I think) Volumes, which already are node-specific.

Need to bump eve-api after PRs lf-edge/eve-api#65 lf-edge/eve-api#63 Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>

Need to bump eve-api after following PRs lf-edge/eve-api#63 lf-edge/eve-api#65 Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>

Pramodh Pallapothu added 2 commits August 23, 2024 11:02

Add files from make proto

92ead4e

Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>

zedi-pramodh requested a review from eriknordmark as a code owner August 23, 2024 18:17

zedi-pramodh requested review from naiming-zededa and eriknordmark and removed request for eriknordmark August 23, 2024 18:18

naiming-zededa reviewed Aug 23, 2024

View reviewed changes

zedi-pramodh requested a review from deitch August 27, 2024 18:38

eriknordmark approved these changes Aug 27, 2024

View reviewed changes

eriknordmark merged commit bc87898 into lf-edge:main Aug 28, 2024
4 checks passed

eriknordmark mentioned this pull request Aug 28, 2024

LGTM #66

Open

zedi-pramodh pushed a commit to zedi-pramodh/eve that referenced this pull request Aug 28, 2024

go mod files after bump-eve-api

8d525a5

Need to bump eve api due to PR lf-edge/eve-api#65 Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>

zedi-pramodh mentioned this pull request Aug 28, 2024

Update eve api vendor and go mod files lf-edge/eve#4181

Merged

zedi-pramodh pushed a commit to zedi-pramodh/eve that referenced this pull request Aug 29, 2024

Bump eve-api

aaa204c

Need to bump eve-api after PRs lf-edge/eve-api#65 lf-edge/eve-api#63 Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>

zedi-pramodh pushed a commit to zedi-pramodh/eve that referenced this pull request Aug 29, 2024

Bump eve-api

de4feac

Need to bump eve-api after following PRs lf-edge/eve-api#63 lf-edge/eve-api#65 Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>

OhmSpectator pushed a commit to lf-edge/eve that referenced this pull request Aug 29, 2024

Bump eve-api

77a3409

Need to bump eve-api after following PRs lf-edge/eve-api#63 lf-edge/eve-api#65 Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add designated node id to contenttree #65

Add designated node id to contenttree #65

zedi-pramodh commented Aug 23, 2024

naiming-zededa left a comment

deitch commented Aug 27, 2024

zedi-pramodh commented Aug 27, 2024

eriknordmark left a comment

zedi-pramodh commented Aug 27, 2024

deitch commented Aug 28, 2024

zedi-pramodh commented Aug 28, 2024

deitch commented Aug 29, 2024

Add designated node id to contenttree #65

Add designated node id to contenttree #65

Conversation

zedi-pramodh commented Aug 23, 2024

naiming-zededa left a comment

Choose a reason for hiding this comment

deitch commented Aug 27, 2024

zedi-pramodh commented Aug 27, 2024

eriknordmark left a comment

Choose a reason for hiding this comment

zedi-pramodh commented Aug 27, 2024

deitch commented Aug 28, 2024

zedi-pramodh commented Aug 28, 2024

deitch commented Aug 29, 2024