Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add designated node id to contenttree #65

Merged
merged 2 commits into from
Aug 28, 2024

Conversation

zedi-pramodh
Copy link

Add designated node id to content tree
This node id will be used to decide if the content tree content need to be download or not
on that particular edge node. On a clustered eve devices this value can be used to determine which node is reponsible
to download the content. This is very similar to the Volume message where we use designated node id to determine the
node responsible to create the Volume.

On a single node eve installations assumption is that controller will pass in the non nil value,
but the code should be able to handle the nil values gracefully.

Pramodh Pallapothu added 2 commits August 23, 2024 11:02
This node id will be used to decide if the content tree content need to be download or not
on that particular edge node. On a clustered eve devices this value can be used to determine which node is reponsible
to download the content. This is very similar to the Volume message where we use designated node id to determine the
node responsible to create the Volume.

On a single node eve installations assumption is that controller will pass in the non nil value,
but the code should be able to handle the nil values gracefully.

Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>
Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>
@zedi-pramodh zedi-pramodh requested review from naiming-zededa and eriknordmark and removed request for eriknordmark August 23, 2024 18:18
Copy link
Contributor

@naiming-zededa naiming-zededa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zedi-pramodh zedi-pramodh requested a review from deitch August 27, 2024 18:38
@deitch
Copy link
Contributor

deitch commented Aug 27, 2024

I'm somewhat confused by this. A content tree is just a reference to content. When we send it to a node, we are saying, you want this (well, maybe connected to a volume). Doesn't it muddle the content resource if we add a node to it?

If we want to say, "here is a content tree, but don't download it", shouldn't we not send it to that node? Isn't that the job of the controller, to determine who will get the message, "download this tree"?

Conversely, if we really have to send it to multiple nodes - it would help to understand why the controller would send it to nodes A, B, C if only B will download it - then I think a separate structure would be the right place. "This is a content tree" should look identical everywhere. "This is a list of nodes that should have content tree X", where X is the reference to the aforementioned content tree, is distinct.

@zedi-pramodh
Copy link
Author

This is very similar to Volumes struct which already has designated node id. Consider nodes A, B and C. Where the volume creation happens on A and is replicated to B and C. The idea is that for a replicated volume, longhorn is already creating a replica PVC on the nodes B and C, there is absolutely no need to download the content of the volume to nodes B and C. Having this designated node id tells zedmanager and zedagent to not worry of content tree config/status as long as Volume status can be fetched to start an app (trigger domainmgr). If we don't do this way, then we need to come up with a more complex mechanism of cluster wide pubsub or make big changes in eve micro services. This approach simplifies the code on remote nodes. Now why cloud has to send config to all nodes ? Its mostly for handling failover cases, when an app failover, kubernetes will start the app but domainmgr needs to report back to cloud, domainmgr only looks for app if it is in the config.

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

As we go forward we should document a bit more how this designated node id and other cluster concepts work - maybe start on a docs/EDGE-NODE-CLUSTERS.md and cover the cluster formation, recovery/replacement, and updates. Does Andrew already have some things written up to start such a document?

@zedi-pramodh
Copy link
Author

I believe Andrew has multiple docs, we need to formalize into one doc and publish as docs/EDGE-NODE-CLUSTERS.md.
We will work on that.

@deitch
Copy link
Contributor

deitch commented Aug 28, 2024

This is very similar to Volumes struct which already has designated node id.

My understanding (correct me) is that a volume is a specific thing on a specific node. It lives only in one place, like a VM or a container, while a content tree is like a VM or OCI image, it is referenced in many places, is identical, can be instantiated in many places into a VM or container; once it does, then its local instantiation (i.e. volume) is unique. If you like, class vs object (I never really loved OOP, but the ideas are useful).

Consider nodes A, B and C. Where the volume creation happens on A and is replicated to B and C. The idea is that for a replicated volume, longhorn is already creating a replica PVC on the nodes B and C, there is absolutely no need to download the content of the volume to nodes B and C.

I know you are right about this behaviour. The question is how to configure it. Sending a node a content tree in its config does not mean, "download this content tree", it means, "make sure this content tree exists on your node, I do not particularly care how." Until now, there was only one way, i.e. download it, so volumemgr (I think?) implicitly converted "here is a content tree you must have" to "here is a content tree to download". You are saying, there now are multiple ways of satisfying the requirement, "have this content tree available". If the implicit assumption of "here is a tree = download it" no longer holds, volumemgr should understand other ways of getting volumes.

In the end: zed* receives a config with content tree and volumes, passes them to volumemgr; when volumemgr is done, those trees and volumes exist. How it does so is transparent to everyone except volumemgr. Any changes to zed* or outside services makes the whole structure more complex, and creates a lot of possibilities for breaking things.

Now why cloud has to send config to all nodes ? Its mostly for handling failover cases, when an app failover, kubernetes will start the app but domainmgr needs to report back to cloud, domainmgr only looks for app if it is in the config.

Controller always sends config to all nodes, otherwise a node cannot function. And it always should tell it, "here are all the things you need to know to function." Did I misunderstand the point you are making?

@eriknordmark eriknordmark merged commit bc87898 into lf-edge:main Aug 28, 2024
4 checks passed
@eriknordmark eriknordmark mentioned this pull request Aug 28, 2024
@zedi-pramodh
Copy link
Author

This is very similar to Volumes struct which already has designated node id.

My understanding (correct me) is that a volume is a specific thing on a specific node. It lives only in one place, like a VM or a container, while a content tree is like a VM or OCI image, it is referenced in many places, is identical, can be instantiated in many places into a VM or container; once it does, then its local instantiation (i.e. volume) is unique. If you like, class vs object (I never really loved OOP, but the ideas are useful).

Consider nodes A, B and C. Where the volume creation happens on A and is replicated to B and C. The idea is that for a replicated volume, longhorn is already creating a replica PVC on the nodes B and C, there is absolutely no need to download the content of the volume to nodes B and C.

I know you are right about this behaviour. The question is how to configure it. Sending a node a content tree in its config does not mean, "download this content tree", it means, "make sure this content tree exists on your node, I do not particularly care how." Until now, there was only one way, i.e. download it, so volumemgr (I think?) implicitly converted "here is a content tree you must have" to "here is a content tree to download". You are saying, there now are multiple ways of satisfying the requirement, "have this content tree available". If the implicit assumption of "here is a tree = download it" no longer holds, volumemgr should understand other ways of getting volumes.

In the end: zed* receives a config with content tree and volumes, passes them to volumemgr; when volumemgr is done, those trees and volumes exist. How it does so is transparent to everyone except volumemgr. Any changes to zed* or outside services makes the whole structure more complex, and creates a lot of possibilities for breaking things.

The problem is that on the remote nodes we do not need to download the content at all since the content is already converted to volumes and shows up as a PVC. For a native container OCI image we will download the content on remote nodes too. So we need a way to tell if the content downloads needs to be ignored on that node.

Now why cloud has to send config to all nodes ? It's mostly for handling failover cases, when an app failover, kubernetes will start the app but domainmgr needs to report back to cloud, domainmgr only looks for app if it is in the config.

Controller always sends config to all nodes, otherwise a node cannot function. And it always should tell it, "here are all the things you need to know to function." Did I misunderstand the point you are making?

What I am trying to say is controller will send app config to all nodes in a cluster even though that app is not supposed to run on that node. So we need a way to make sure that particular node is not downloading content unnecessarily.

zedi-pramodh pushed a commit to zedi-pramodh/eve that referenced this pull request Aug 28, 2024
Need to bump eve api due to PR lf-edge/eve-api#65

Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>
@deitch
Copy link
Contributor

deitch commented Aug 29, 2024

The problem is that on the remote nodes we do not need to download the content at all since the content is already converted to volumes and shows up as a PVC. For a native container OCI image we will download the content on remote nodes too. So we need a way to tell if the content downloads needs to be ignored on that node

@zedi-pramodh totally agree about the behaviour. I am discussing where we flag, "this content tree can be made available via method A (download) on this node, and method B (volume replication) or C on that node." My point is, content tree is an immutable thing, and is identical. Inclusion of a content tree by config means, "be sure this content tree is available". It does not mean, "download this content tree". So if we want to tell a node how to get that tree, that is something other than a property of the tree (probably property of the Volume).

What I am trying to say is controller will send app config to all nodes in a cluster even though that app is not supposed to run on that node. So we need a way to make sure that particular node is not downloading content unnecessarily

Ah, ok, that makes sense. Brings us back to the previous point.

This is sort of like saying, I will want to run a container made from image alpine:3.20 on 3 nodes. Node A should download it then sync to B and C. I won't modify the tag or manifest of alpine:3.20 to tell A to download and B,C to sync; I will have the same reference to alpine:3.20 on all 3 nodes, and have a separate instruction set that says, "here is how to get it", which in our case is (I think) Volumes, which already are node-specific.

zedi-pramodh pushed a commit to zedi-pramodh/eve that referenced this pull request Aug 29, 2024
Need to bump eve-api after PRs
lf-edge/eve-api#65
lf-edge/eve-api#63

Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>
zedi-pramodh pushed a commit to zedi-pramodh/eve that referenced this pull request Aug 29, 2024
Need to bump eve-api after following PRs
lf-edge/eve-api#63
lf-edge/eve-api#65

Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>
OhmSpectator pushed a commit to lf-edge/eve that referenced this pull request Aug 29, 2024
Need to bump eve-api after following PRs
lf-edge/eve-api#63
lf-edge/eve-api#65

Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants