Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BestEffort pods are using swap #343

Open
robertbotez opened this issue Apr 1, 2024 · 5 comments
Open

BestEffort pods are using swap #343

robertbotez opened this issue Apr 1, 2024 · 5 comments

Comments

@robertbotez
Copy link

robertbotez commented Apr 1, 2024

What happened?

I already opened a ticket on kube repo which leaded me here.

I was testing the support for swap and I came to an unexpected behavior. In the documentation it is specified that only pods that fall under the Burstable class can use the host's swap memory. However, I created both a deployment with 1 replica of ubuntu belonging to the Burstable class, and one belonging to the BestEffort class, where I ran the command stress --vm 1 --vm-bytes 6G --vm-hang 0 to see the consumption of memory made. The host has 4GB RAM memory and 5GB swap. In both situations, the pod started using swap after exceeding the RAM memory requirement. Wasn't the BestEffort pod supposed to be restarted when it reached the limit of the host's RAM memory? I mention that the kubelet is configured to swapBehavior=LimitedSwap. I attached two pictures where you can see the normal consumption of host, and consumption after running stress command inside pod
Screenshot 2024-03-27 at 12 02 30
.
screenshot_2024-03-27_at_12 37 02

What did you expect to happen?

I expected the BestEffort pod to be killed when it consumes more RAM memory than the host have available.

How can we reproduce it (as minimally and precisely as possible)?

  • setup a VM running ubuntu 22.04 with 4GB of RAM memory
  • set swap partition to 5GB
  • install docker, cri-dockerd and kubernetes packages using the provided versions
  • config kubelet with provided config
  • install calico cni
  • after the cluster is bootstrapped, deploy the following deployment
$ $ cat test.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ubuntu-deployment
  labels:
    app: ubuntu
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ubuntu
  template:
    metadata:
      labels:
        app: ubuntu
    spec:
      containers:
      - name: ubuntu
        image: ubuntu:22.04
        resources:
        command: [ "/bin/bash", "-c", "--" ]
        args: [ "while true; do sleep 30; done;" ]
  • this should deploy a BestEffort pod. you can check this by running kubectl get pod <pod-name> --output=yaml
  • exec into the pod and do apt update & apt install stress. Then run stress --vm 1 --vm-bytes 6G --vm-hang 0
  • check the node where the pod is running with kubectl get po -o wide then ssh to that node and run htop. Now you should see that the deployed BestEffort pod is consuming swap memory, which according to the Docs, it shouldn't.
  • if exec into the pod and check memory.swap.max, this is set to max. From what I understand, even if swapBehavior was set to LimitedSwap in kubelet, somehow cri-dockerd may be set the cgroup for memory.swap.max to max.
$ cat /sys/fs/cgroup/memory.swap.max 
max

Anything else we need to know?

I am using cgroup v2.

Here is my kubelet config.

$ cat /var/lib/kubelet/config.yaml 
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
 anonymous:
  enabled: false
 webhook:
  cacheTTL: 0s
  enabled: true
 x509:
  clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
 mode: Webhook
 webhook:
  cacheAuthorizedTTL: 0s
  cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
containerRuntimeEndpoint: ""
cpuManagerReconcilePeriod: 0s
enableServer: true
evictionPressureTransitionPeriod: 0s
failSwapOn: false
featureGates:
 NodeSwap: true
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMaximumGCAge: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
logging:
 flushFrequency: 0
 options:
  json:
   infoBufferSize: "0"
 verbosity: 0
memorySwap:
 swapBehavior: LimitedSwap
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
resolvConf: /run/systemd/resolve/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s

Kubernetes version

$ kubectl version
Client Version: v1.29.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.3

Cloud provider

Hetzner Cloud, but Kubernetes was deployed using `kubeadm`.

OS version

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ uname -a
Linux fs-kube-dev-1 5.15.0-100-generic #110-Ubuntu SMP Wed Feb 7 13:27:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Install tools

$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"29", GitVersion:"v1.29.3", GitCommit:"6813625b7cd706db5bc7388921be03071e1a492d", GitTreeState:"clean", BuildDate:"2024-03-15T00:06:16Z", GoVersion:"go1.21.8", Compiler:"gc", Platform:"linux/amd64"}

Container runtime (CRI) and version (if applicable)

$ cri-dockerd --version
cri-dockerd 0.3.11 (9a8a9fe)

Related plugins (CNI, CSI, ...) and versions (if applicable)

calico:
version: 3.27.2

@iholder101
Copy link

/cc

@neersighted
Copy link
Collaborator

This is because KEP 2400 was never supported, as best I can tell.

@kannon92
Copy link

kannon92 commented Apr 3, 2024

yea, its more of a feature request for KEP 2400. I was hoping someone in the cri-dockerd could explore implementing this?

@neersighted
Copy link
Collaborator

PRs are welcome, and a couple of the regular contributors have done other KEP enablement work and might be interested in picking this up (but also I can't speak for their interest or priorities).

@afbjorklund
Copy link
Contributor

I think it was a known issue, support for Docker is not a requirement for adding new features.

Memory QoS in Alpha phase is designed to support containerd and cri-o.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants