-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU/TPU-resource-usage-collect-addon #20
GPU/TPU-resource-usage-collect-addon #20
Conversation
/assign @zhujian7 @haoqing0110 @qiujian16 |
caa40ee
to
83f79ab
Compare
} | ||
|
||
func (s *Score) calculateClusterAllocateable(resourceName clusterv1.ResourceName) (float64, error) { | ||
// Iterate every node, find the node with maximum allocatable resource, return the number and node name. | ||
func (s *Score) calculateClusterAllocatable(resourceName string) (float64, string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calculateMaxAllocatableNode?
|
||
After the deployment is complete, addon will create an addonplacementscore in its own namespace for each managedcluster in the hub. | ||
On the hub cluster, you can see the `addonTemplate`, and check the `managedClusterAddon` status. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AddOnTemplate
, ManagedClusterAddon
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I have changed that.
}, | ||
{ | ||
Name: "tpuAvailable", | ||
Value: tpuScore, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm thinking should we keep the current xxxAvailable as cluster level score and add a new score based on max available node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or at least explain the score is based on the cluster max available node in the readme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I should add an explanation in the Documentation.
For an extra score, I think we can do that too, like for example for each cluster, we have cpuClusterAvailable
, cpuNodeAvaliable
, gpuClusterAvaliable
, gpuNodeAvaliable
, etc.
} | ||
|
||
func (s *Score) calculateNodeResourceUsage(nodeName string, resourceName string) (float64, error) { | ||
list, err := s.podLister.List(labels.Everything()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completed
pods should be filtered out. Eg,
NAME READY STATUS RESTARTS AGE
demo1-job8n84t-bgzhs 0/1 Completed 0 19m
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed that.
@qiujian16 could you help take a final review? |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: qiujian16, z1ens The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…ach managed clusters, develop the add-on in addontemplate mode. Signed-off-by: z1ens <xxtale02591@gmail.com>
d748675
to
6503ea3
Compare
/lgtm |
/unhold |
1 similar comment
/unhold |
ebebf50
into
open-cluster-management-io:main
…ach managed clusters, develop the add-on in addontemplate mode. (open-cluster-management-io#20) Signed-off-by: z1ens <xxtale02591@gmail.com>
…ach managed clusters, develop the add-on in addontemplate mode. (open-cluster-management-io#20) Signed-off-by: z1ens <xxtale02591@gmail.com>
template
tag.cluster-role
andaddon-template
.placement
in a separated file.role
androlebinding
inaddontemplate.go
Update on Aug.15th:
REF: open-cluster-management-io/ocm#369
Old PR: #16 (comment)