kube-up.sh containerd and Daemonset support #1366

sonyafenge · 2022-02-16T21:41:49Z

What type of PR is this?

/kind feature

What this PR does / why we need it:
part of merge from poc-2022-01-30 to master, including:

add node tags for rp minion
add workaround for klog issue causing CoreDNS to crash
kube-up.sh support Daemonset for scale-out
Change kube-up.sh to use Containerd as default runtime

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:
Test Scenarios:

Kubeup (scale up) + kubemark ( scale up): successfully
Kubeup (scale up) + kubemark ( scale out 1*1): successfully
Kubeup (scale out 1*1): successfully except known issues (metrics-server, cordons... pods failure)

Does this PR introduce a user-facing change?:

* scaleout tp server enable daemonset * rp kubelet add tenant-server-kubeconfig information * remove daemonset from RP server

…a#1273) * add install-docker and install-containerd for ubuntu * change default runtime to contanerd * remove unused LOG_DUMP_SYSTEMD_SERVICES from config

yb01 · 2022-02-16T23:48:57Z

cluster/addons/dns/coredns/coredns.yaml.base

@@ -129,6 +129,8 @@ spec:
        - name: config-volume
          mountPath: /etc/coredns
          readOnly: true
+        - name: tmp


what file does coredns use for here ?

from the individual commit, i can assume this is for the logs.
this seem a bit hack especially this is for master code.
normally the pod app should be able to get its log out. we should leverage, or use the termination log if this for pod crash csaes.

This is a cherry-pick from community Kubernetes. kubernetes/kubernetes#82128
we do have this kind of error before this fix.

Thanks for the info.

This seems to be temp workaround for an issue in klog in k8s release 1.14 and 1.15 as the change only short-lived in those two releases only, in 1.16 k8s didn't take this change.

we probably should not take it instead check how eventually the fix was addressed.

is this change helping us? the root issue is the coredns cannot access the api server and hence the crash-restart, if the root issue is addressed, we won't need this fix -- i.e. we can afford even a short restart of the Dns service for NOW.

at least for now, it help us. once the root cause of cordons addressed, not sure what will happened once we remove this.

as we discussed, let's add some comment on this change for future update to coredns in arktos if resource is available.

yb01 · 2022-02-17T00:09:45Z

cluster/gce/gci/configure.sh

+
+  # Override to latest versions of containerd and runc
+  systemctl stop containerd
+  if [[ ! -z "${UBUNTU_INSTALL_CONTAINERD_VERSION:-}" ]]; then


i recall we have an internal version of containerd for mizar CNI. can you double check with Hongwei on this if we need the internal version ?

yb01

overall looks good. the way to get the coredns log seems not meeting the master bar. please investigate.

yb01

/lgtm.

please annotate the coredns workaround so we can get a chance to track it and fix.

sonyafenge · 2022-02-19T00:49:22Z

For commit: “add workaround for klog issue causing CoreDNS to crash”, did the investigation below:

Started a run using POC branch and remove this commit. Then we got dns error:

{"log":"E0218 22:16:26.720159       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:315: Failed to list *v1.Service: Get https://10.0.0.1:443/api/v1/services?limit=500\u0026resourceVersion=0: dial tcp 10.0.0.1:443: i/o timeout\n","stream":"stdout","time":"2022-02-18T22:16:26.720418911Z"}{"log":"E0218 22:16:26.720159       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:315: Failed to list *v1.Service: Get https://10.0.0.1:443/api/v1/services?limit=500\u0026resourceVersion=0: dial tcp 10.0.0.1:443: i/o timeout\n","stream":"stdout","time":"2022-02-18T22:16:26.720476197Z"}{"log":"log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-7778f74bb8-f77nl.unknownuser.log.ERROR.20220218-221626.1: no such file or directory\n","stream":"stdout","time":"2022-02-18T22:16:26.720483203Z"}

$ kubectl get pods -AT --kubeconfig=/home/sonyali/go/src/k8s.io/arktos/cluster/kubeconfig.tp-1 -owide | grep coredns
system   kube-system   coredns-7778f74bb8-f77nl                                  2464322247357229799   0/1     CrashLoopBackOff    7          42m   50.248.0.24   sonyaperf1-021822-rp-1-minion-group-7wbj   <none>           <none>
system   kube-system   coredns-default-sonyaperf1-021822-tp-1-7dcc7dbc99-fdtcd   267252184356384451    0/1     CrashLoopBackOff    7          41m   50.248.0.23   sonyaperf1-021822-rp-1-minion-group-5t8q   <none>           <none>

Review the detail of this fix from community. It called “workaround” because this is a coredns issue and has been fixed by coredns 1.4+.
Arktos are still using coredns:1.3.1. we need either pickup this fix or upgrade coredns to 1.4+. Community use this fix till coredns upgrade to 1.4+ when kubernetes 1.16+
check community kubernetes upgrade cordons history and found these PR may be necessary:
https://github.com/kubernetes/kubernetes/pull/78033
https://github.com/kubernetes/kubernetes/pull/82093
open arktos issue to track coredns upgrade:
upgrade coredns to v1.4+ #1374

zmn223 · 2022-02-19T02:54:48Z

/lgtm

zmn223 · 2022-02-19T02:55:26Z

/approve

centaurus-cloud-bot · 2022-02-19T02:55:30Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yb01, zmn223

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [zmn223]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

add node tags for rp minion

4ce170c

centaurus-cloud-bot added the size/L label Feb 16, 2022

rajansandeep and others added 3 commits February 16, 2022 22:07

add workaround for klog issue causing CoreDNS to crash

cae122b

kube-up.sh support Daemonset for scale-out (CentaurusInfra#1271)

0da4074

* scaleout tp server enable daemonset * rp kubelet add tenant-server-kubeconfig information * remove daemonset from RP server

Change kube-up.sh to use Containerd as default runtime (CentaurusInfr…

e35ca94

…a#1273) * add install-docker and install-containerd for ubuntu * change default runtime to contanerd * remove unused LOG_DUMP_SYSTEMD_SERVICES from config

sonyafenge force-pushed the master-mizar-intergration-preparation-new branch from ffb21a5 to e35ca94 Compare February 16, 2022 22:08

sonyafenge requested review from yb01, Sindica, h-w-chen and q131172019 February 16, 2022 23:19

yb01 reviewed Feb 16, 2022

View reviewed changes

yb01 reviewed Feb 17, 2022

View reviewed changes

yb01 approved these changes Feb 18, 2022

View reviewed changes

sonyafenge mentioned this pull request Feb 19, 2022

upgrade coredns to v1.4+ #1374

Open

Merge branch 'master' into master-mizar-intergration-preparation-new

8e589e9

centaurus-cloud-bot assigned zmn223 Feb 19, 2022

centaurus-cloud-bot added the lgtm label Feb 19, 2022

centaurus-cloud-bot added the approved label Feb 19, 2022

centaurus-cloud-bot merged commit 30f1d70 into CentaurusInfra:master Feb 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kube-up.sh containerd and Daemonset support #1366

kube-up.sh containerd and Daemonset support #1366

sonyafenge commented Feb 16, 2022

yb01 Feb 16, 2022 •

edited

Loading

yb01 Feb 17, 2022

sonyafenge Feb 17, 2022

yb01 Feb 17, 2022

sonyafenge Feb 18, 2022

yb01 Feb 18, 2022

yb01 Feb 17, 2022

yb01 left a comment

yb01 left a comment

sonyafenge commented Feb 19, 2022

zmn223 commented Feb 19, 2022

zmn223 commented Feb 19, 2022

centaurus-cloud-bot commented Feb 19, 2022

kube-up.sh containerd and Daemonset support #1366

kube-up.sh containerd and Daemonset support #1366

Conversation

sonyafenge commented Feb 16, 2022

yb01 Feb 16, 2022 • edited Loading

Choose a reason for hiding this comment

yb01 Feb 17, 2022

Choose a reason for hiding this comment

sonyafenge Feb 17, 2022

Choose a reason for hiding this comment

yb01 Feb 17, 2022

Choose a reason for hiding this comment

sonyafenge Feb 18, 2022

Choose a reason for hiding this comment

yb01 Feb 18, 2022

Choose a reason for hiding this comment

yb01 Feb 17, 2022

Choose a reason for hiding this comment

yb01 left a comment

Choose a reason for hiding this comment

yb01 left a comment

Choose a reason for hiding this comment

sonyafenge commented Feb 19, 2022

zmn223 commented Feb 19, 2022

zmn223 commented Feb 19, 2022

centaurus-cloud-bot commented Feb 19, 2022

yb01 Feb 16, 2022 •

edited

Loading