-
Notifications
You must be signed in to change notification settings - Fork 585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for Dynamic Resource Allocation #1078
Conversation
8588258
to
812d897
Compare
Moshe, small nit before we look into this: please split up the commit into "upgrade kube to v0.27.0 + vendor that" and then do the remaining "real" changes. This would make it easier to review. |
Sure will do |
are there more comments on this PR? |
The main comment (without looking into how DRA exactly works) from my side is that we might want to have a function that gets or adds the dynamic resources. The current change adds the for-loop in line, which increases the nesting quite a bit. After reading the code, I want to make sure I understand what's going on here. @moshe010 , is there are reason why we don't want to change pkg/checkpoint/checkpoint.go ? Most importantly, @moshe010 , Vrinda from my team will be taking a look into this in more detail. Please allow a bit more time for some comments. We want to understand how this would integrate with sriov network operator. |
Sure I will created separate function for dynamic resources
|
ec58c31
to
bedad0c
Compare
LGTM |
Is there any other who required to review or can we merge this? |
Can you update the docs ? I am looking for a couple paragraphs about the feature and some user documentation on how can the user it. |
sure, you want to me to add DRA section here https://github.com/k8snetworkplumbingwg/multus-cni/blob/master/docs/how-to-use.md? |
Unsure what's the best place ... But yeah, let's put it there, if needed you'll move it around. |
The CI failed don't seem to be related to the PR see [1]. [1] - |
Hey @moshe010 ; all good. Regarding jinja, you need to update it to latest version (we actually couldn't find an updated version, and thus replaced it). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat docs.
Just pointing out some typo.
I see so it mean that some will need to do similar fix to this repo to fix the jinja error. |
pushed fix #1189 |
Are we ok with this change? are we waiting for someone else to review it? |
docs/how-to-use.md
Outdated
|
||
#### Install DRA driver | ||
|
||
The current example uses Nvidia DRA driver for networking. This DRA driver is not publicly available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any plan to release this DRA driver? Or it is never released for internal use?
Folks, I've added an integration test with an example dra driver implementation as discussed in the last community meeting. Please have a look at the latest commit and let me if that's sufficient to get that PR merged. Notice the condition on running the integration test. This looks irrelevant to this PR. If you have any insights let me know. |
Signed-off-by: Moshe Levi <moshele@nvidia.com>
Signed-off-by: Moshe Levi <moshele@nvidia.com>
Signed-off-by: Vasilis Remmas <vremmas@nvidia.com>
We now run the e2e tests for both thick and thin as requested in the community meeting today. The difference is that all the DRA example Pods are running on the worker nodes. Before, the controller Pod was running on the control plane node where the network is not stable. I have added this PR #1259 to demonstrate what's failing and prove that this is not a regression introduced by this PR. |
Dynamic Resource Allocation is alternative mechanism to device plugin which allow to requests pod and container resources. The feature is alpha in k8s 1.27. | ||
|
||
The following sections describe how to use DRA with multus and Nvidia DRA driver. Other DRA networking driver vendors should follow similar concepts to make use of multus DRA support. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some text along the lines of:
Dynamic Resource Allocation (DRA) is [currently an alpha](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/), and is subject to change. Please consider this functionality as a preview, and utilized a potentially replace for device plugin for SR-IOV. The architecture and usage of DRA in Multus CNI may be changed in the future as this technology matures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, updated! Added the fact that DRA is in alpha as warning.
Signed-off-by: Vasilis Remmas <vremmas@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the patience getting this together.
Dynamic Resource Allocation was added to k8s 1.26 [1].
In k8s 1.27 we added PodResources API to expose the dynamic resources from kubelet.
This PR allow multus to get Dynamic Resource from the podResource API and pass it to CNI as DeviceID.
To use this you will need k8s .127 with DynamicResourceAllocation and KubeletPodResourcesDynamicResources feature gates enabled.
[1] - https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/3063-dynamic-resource-allocation
[2] - https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/3695-pod-resources-for-dra