Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when draining pods with kubernetes upgrade #2866

Closed
nicolaspernoud opened this issue Sep 17, 2024 · 12 comments
Closed

Error when draining pods with kubernetes upgrade #2866

nicolaspernoud opened this issue Sep 17, 2024 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@nicolaspernoud
Copy link

Kairos version:

PRETTY_NAME="Ubuntu 24.04 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
KAIROS_VERSION="v3.1.2-v1.30.4-k3s1"
KAIROS_IMAGE_LABEL="24.04-standard-amd64-generic-v3.1.2-k3sv1.30.4-k3s1"
KAIROS_FLAVOR_RELEASE="24.04"
KAIROS_HOME_URL="https://github.com/kairos-io/kairos"
KAIROS_ID="kairos"
KAIROS_ARTIFACT="kairos-ubuntu-24.04-standard-amd64-generic-v3.1.2-k3sv1.30.4+k3s1"
KAIROS_REGISTRY_AND_ORG="quay.io/kairos"
KAIROS_BUG_REPORT_URL="https://github.com/kairos-io/kairos/issues"
KAIROS_SOFTWARE_VERSION_PREFIX="k3s"
KAIROS_NAME="kairos-standard-ubuntu-24.04"
KAIROS_PRETTY_NAME="kairos-standard-ubuntu-24.04 v3.1.2-v1.30.4-k3s1"
KAIROS_IMAGE_REPO="quay.io/kairos/ubuntu:24.04-standard-amd64-generic-v3.1.2-k3sv1.30.4-k3s1"
KAIROS_VARIANT="standard"
KAIROS_MODEL="generic"
KAIROS_TARGETARCH="amd64"
KAIROS_RELEASE="v3.1.2"
KAIROS_GITHUB_REPO="kairos-io/kairos"
KAIROS_ID_LIKE="kairos-standard-ubuntu-24.04"
KAIROS_VERSION_ID="v3.1.2-v1.30.4-k3s1"
KAIROS_FLAVOR="ubuntu"
KAIROS_FAMILY="ubuntu"

CPU architecture, OS, and Version:
Linux *** 6.8.0-41-generic #41-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug 2 20:41:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Describe the bug
When running a kubernetes update following https://kairos.io/docs/upgrade/kubernetes/, the existing pods aren't deleted.
So the update does not start.

To Reproduce
Upgrade like explained in the documentation

Expected behavior
The upgrade should run.

Logs
kubectl -n system-upgrade logs apply-custom-os-upgrade-on-**** -c drain gives :

Warning: ignoring DaemonSet-managed Pods: kube-system/svclb-traefik-775db17a-xxldl
There are pending pods in node "****" when an error occurred: pods "coredns-576bfc4dc7-sc2b8" is forbidden: User "system:serviceaccount:system-upgrade:system-upgrade" cannot delete resource "pods" in API group "" in the namespace "kube-system"
@nicolaspernoud nicolaspernoud added bug Something isn't working triage Add this label to issues that should be triaged and prioretized in the next planning call unconfirmed labels Sep 17, 2024
@jimmykarily jimmykarily moved this to In Progress 🏃 in 🧙Issue tracking board Sep 23, 2024
@jimmykarily jimmykarily moved this from Todo 🖊 to In Progress 🏃 in 🧙Issue tracking board Sep 30, 2024
@mudler mudler mentioned this issue Oct 1, 2024
42 tasks
@jimmykarily
Copy link
Contributor

Our instructions on how to install the system upgrade controller needed updating (see here). This seems to be related to missing rbac permissions and it may be solved by installing the latest following the new instructions.

Let me finish documenting the installation and we can check if this one is fixed.

@jimmykarily jimmykarily removed the triage Add this label to issues that should be triaged and prioretized in the next planning call label Oct 3, 2024
@nicolaspernoud
Copy link
Author

Ok, thanks for the update. Il will give it a try when the documentation is up to date.

@jimmykarily
Copy link
Contributor

I tried with the new instruction (see the linked PR) and build from kairos master and it works as expected:

root@localhost:/home/kairos# kubectl logs -c drain  -n system-upgrade apply-os-upgrade-on-localhost-with-e22018c23a3813e297c32f-s97j5
node/localhost cordoned
Warning: ignoring DaemonSet-managed Pods: kube-system/svclb-traefik-94541d92-n6sbc
pod/git-gk2v6 deleted
pod/helm-install-traefik-crd-l5rxt deleted
pod/helm-install-traefik-d4hth deleted
pod/traefik-8dc7cf49b-72jlk deleted
pod/coredns-56f6fc8fd7-tfwwp deleted
pod/metrics-server-5985cbc9d7-jngf7 deleted
pod/local-path-provisioner-846b9dcb6c-hdwf7 deleted
node/localhost drained

(that's the drain container logs, the upgrade works too with no errors).

@nicolaspernoud
Copy link
Author

I confirm that works.
I now have another problem.
When trying a custom upgrade like described here : https://kairos.io/docs/upgrade/kubernetes/#customize-the-upgrade-plan ... I obtain Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown ...
Is this supposed to work ?

@jimmykarily
Copy link
Contributor

I confirm that works. I now have another problem. When trying a custom upgrade like described here : https://kairos.io/docs/upgrade/kubernetes/#customize-the-upgrade-plan ... I obtain Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown ... Is this supposed to work ?

I think somehow the image: field is not concatenated with the version: one so it pulls latest. This indicates I'm probably correct:

~ $ docker run --rm -it quay.io/kairos/ubuntu /bin/bash
WARNING: The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64/v3) and no specific platform was requested
exec /bin/bash: no such file or directory

The question is, why?

@jimmykarily
Copy link
Contributor

We are supposed to put the tag in the image: field too: https://github.com/rancher/system-upgrade-controller/blob/f7f79ff18733b3c67ef0860db635abcf2391430f/examples/openSUSE/microos.yaml#L38

Then what is the version: for? (I'm looking)

@jimmykarily
Copy link
Contributor

Version is used to update the Plan resource from what I see in the code. In any case, I don't see any concatenation happening: https://github.com/rancher/system-upgrade-controller/blob/f7f79ff18733b3c67ef0860db635abcf2391430f/pkg/upgrade/container/container.go#L93

spec.upgrade.image is used "as is". @nicolaspernoud try to add the tag as well to the image field and see what happens. I need to try this out too and if that's the issue, we need to upgrade the docs.

I wonder if that changed recently (since we now install a latest system-upgrade-controller version). There used to be some concatenation happening: rancher/system-upgrade-controller@a24fafac#diff-a5a8b751c9bff70762c3ffd1f66c5107a837dafce0e220a84b317110e88bb45dL69

but that's from 2020... Maybe the example was based on that old version?

@nicolaspernoud
Copy link
Author

nicolaspernoud commented Oct 7, 2024

I actually already had the "latest" tag on the image field...

  upgrade:
    image: registry.****.eu/myimagename:latest
    command:
      - "/bin/bash"
      - "-c"
    args:
      - bash /host/run/system-upgrade/secrets/custom-script/upgrade.sh
  secrets:
    - name: custom-script
      path: /host/run/system-upgrade/secrets/custom-script

@jimmykarily jimmykarily moved this from Under review 🔍 to In Progress 🏃 in 🧙Issue tracking board Oct 7, 2024
@jimmykarily jimmykarily self-assigned this Oct 7, 2024
@jimmykarily
Copy link
Contributor

Then that image doesn't have /bin/bash in it 🤷‍♂️ . You can install bash in the image or you can change the command to run something that exists.

@nicolaspernoud
Copy link
Author

Ok, thanks for the advice.
The original problem beeing solved I close the issue.
@jimmykarily : thanks a lot for your help and for your work on this amazing project.

@github-project-automation github-project-automation bot moved this from In Progress 🏃 to Done ✅ in 🧙Issue tracking board Oct 7, 2024
@jimmykarily
Copy link
Contributor

Ok, thanks for the advice. The original problem beeing solved I close the issue. @jimmykarily : thanks a lot for your help and for your work on this amazing project.

Thank you for you contribution @nicolaspernoud! It's impossible to keep a project polished (docs included), if users don't report the issues they find, so keep it coming!

@jimmykarily
Copy link
Contributor

For future reference and for the sake of completeness, concatenation does happen here:

and I also checked with a plan like this:

---
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: os-upgrade
  namespace: system-upgrade
  labels:
    k3s-upgrade: server
spec:
  concurrency: 1
  # This is the version (tag) of the image to upgrade to.
  version: "whatever"
  nodeSelector:
    matchExpressions:
      - {key: kubernetes.io/hostname, operator: Exists}
  serviceAccountName: system-upgrade
  cordon: false
  drain:
    force: false
    disableEviction: true
  upgrade:
    # Here goes the image which is tied to the flavor being used.
    # You can also specify your custom image stored in a public registry.
    image: quay.io/kairos/opensuse
    command:
    - "/usr/sbin/suc-upgrade"

and the image that it tried to pull was quay.io/kairos/opensuse:whatever:

  Normal   Pulling    6m36s (x4 over 8m10s)   kubelet            Pulling image "quay.io/kairos/opensuse:whatever"
  Warning  Failed     6m35s (x4 over 8m10s)   kubelet            Failed to pull image "quay.io/kairos/opensuse:whatever": rpc error: code = NotFound desc = failed to pull and unpack image "quay.io/kairos/opensused
  Warning  Failed     6m35s (x4 over 8m10s)   kubelet            Error: ErrImagePull
  Normal   BackOff    3m46s (x16 over 7m42s)  kubelet            Back-off pulling image "quay.io/kairos/opensuse:whatever"

which proves it concatenates.

TL;DR; no documentation change is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

2 participants