Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of underutilized nodes is inaccurate #702

Closed
tomsunyu opened this issue Jan 26, 2022 · 2 comments
Closed

Number of underutilized nodes is inaccurate #702

tomsunyu opened this issue Jan 26, 2022 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@tomsunyu
Copy link

My descheduler policy:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: descheduler-policy-configmap
  namespace: kube-system
data:
  policy.yaml: |
    apiVersion: "descheduler/v1alpha1"
    kind: "DeschedulerPolicy"
    strategies:
      "RemoveDuplicates":
         enabled: true
      "RemovePodsViolatingInterPodAntiAffinity":
         enabled: true
      "LowNodeUtilization":
         enabled: true
         params:
           nodeResourceUtilizationThresholds:
             thresholds:
               "cpu" : 35
               "memory": 35
               "pods": 30
             targetThresholds:
               "cpu" : 65
               "memory": 65
               "pods": 50

View descheduler logs:

I0126 06:30:03.066133       1 nodeutilization.go:164] "Node is underutilized" node="k8sm1" usage=map[cpu:3992m memory:7321672Ki pods:31] usagePercentage=map[cpu:25.265822784810126 memory:11.22652574277022 pods:28.181818181818183]
I0126 06:30:03.066197       1 nodeutilization.go:164] "Node is underutilized" node="k8sm2" usage=map[cpu:4132m memory:7676964Ki pods:23] usagePercentage=map[cpu:26.151898734177216 memory:11.771303521962317 pods:20.90909090909091]
I0126 06:30:03.066220       1 nodeutilization.go:170] "Node is appropriately utilized" node="k8sm3" usage=map[cpu:3382m memory:6337572Ki pods:35] usagePercentage=map[cpu:21.40506329113924 memory:9.717576323699024 pods:31.818181818181817]
I0126 06:30:03.066239       1 nodeutilization.go:164] "Node is underutilized" node="k8sn1" usage=map[cpu:3874m memory:8434724Ki pods:30] usagePercentage=map[cpu:24.364779874213838 memory:12.881420892109507 pods:27.272727272727273]
I0126 06:30:03.066257       1 nodeutilization.go:164] "Node is underutilized" node="k8sn2" usage=map[cpu:3272m memory:2231332Ki pods:28] usagePercentage=map[cpu:20.57861635220126 memory:3.4076665273259077 pods:25.454545454545453]
I0126 06:30:03.066273       1 nodeutilization.go:170] "Node is appropriately utilized" node="k8sn3" usage=map[cpu:2232m memory:5926948Ki pods:47] usagePercentage=map[cpu:14.037735849056604 memory:9.0515720246029 pods:42.72727272727273]
I0126 06:30:03.066292       1 lownodeutilization.go:100] "Criteria for a node under utilization" CPU=35 Mem=35 Pods=30
I0126 06:30:03.066302       1 lownodeutilization.go:101] "Number of underutilized nodes" totalNumber=4
I0126 06:30:03.066313       1 lownodeutilization.go:114] "Criteria for a node above target utilization" CPU=65 Mem=65 Pods=50
I0126 06:30:03.066327       1 lownodeutilization.go:115] "Number of overutilized nodes" totalNumber=0
I0126 06:30:03.066338       1 lownodeutilization.go:133] "All nodes are under target utilization, nothing to do here"

I read the log carefully and found that the node cpu and memory utilization obtained by desheduler are inaccurate.

The following are the node cpu and memory utilization obtained through kubelet

NAME    CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
k8sm1   3538m        22%    15846Mi         24%
k8sm2   1003m        6%     21607Mi         33%
k8sm3   1294m        8%     26295Mi         41%
k8sn1   1079m        6%     17602Mi         27%
k8sn2   2675m        16%    16208Mi         25%
k8sn3   2397m        15%    29483Mi         46%
@tomsunyu tomsunyu added the kind/bug Categorizes issue or PR as related to a bug. label Jan 26, 2022
@damemi
Copy link
Contributor

damemi commented Jan 27, 2022

Hi @tomsunyu
This is because the descheduler calculates node usage based on requests, not actual usage metrics. This is intentional in order to align with the scheduler's usage calculations, which use the same strategy. Kubelet uses metrics to identify the actual current usage instead.

It's been proposed to add a new option for the NodeUtilization strategies to use metrics to evict based on actual usage, though no one has had the availability to implement this yet. There is also the issue that doing so would require using a custom scheduler that also schedules pods based on actual usage (rather than requests), otherwise there will be a conflict between the scheduler and descheduler.

Instead, the recommended best practice is to ensure your pod's requests and limits are close to their actual usage and experiment with modifying these values if not. We are however open to any proposed implementation of metrics-based descheduling.

See related discussion in #225, #437, #270, #118, #90. Because of these, I'm going to close this as a duplicate. Please feel free to follow up discussion in one of the other open issues on this topic. Thank you!
/close

@k8s-ci-robot
Copy link
Contributor

@damemi: Closing this issue.

In response to this:

Hi @tomsunyu
This is because the descheduler calculates node usage based on requests, not actual usage metrics. This is intentional in order to align with the scheduler's usage calculations, which use the same strategy. Kubelet uses metrics to identify the actual current usage instead.

It's been proposed to add a new option for the NodeUtilization strategies to use metrics to evict based on actual usage, though no one has had the availability to implement this yet. There is also the issue that doing so would require using a custom scheduler that also schedules pods based on actual usage (rather than requests), otherwise there will be a conflict between the scheduler and descheduler.

Instead, the recommended best practice is to ensure your pod's requests and limits are close to their actual usage and experiment with modifying these values if not. We are however open to any proposed implementation of metrics-based descheduling.

See related discussion in #225, #437, #270, and #118, #90. Because of these, I'm going to close this as a duplicate. Please feel free to follow up discussion in one of the other open issues on this topic. Thank you!
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants