Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-up.sh containerd and Daemonset support #1366

Conversation

sonyafenge
Copy link
Collaborator

What type of PR is this?

/kind feature

What this PR does / why we need it:
part of merge from poc-2022-01-30 to master, including:

  1. add node tags for rp minion
  2. add workaround for klog issue causing CoreDNS to crash
  3. kube-up.sh support Daemonset for scale-out
  4. Change kube-up.sh to use Containerd as default runtime

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:
Test Scenarios:

  1. Kubeup (scale up) + kubemark ( scale up): successfully
  2. Kubeup (scale up) + kubemark ( scale out 1*1): successfully
  3. Kubeup (scale out 1*1): successfully except known issues (metrics-server, cordons... pods failure)

Does this PR introduce a user-facing change?:


rajansandeep and others added 3 commits February 16, 2022 22:07
* scaleout tp server enable  daemonset

* rp kubelet add tenant-server-kubeconfig information

* remove daemonset from RP server
…a#1273)

* add install-docker and install-containerd for ubuntu

* change default runtime to contanerd

* remove unused LOG_DUMP_SYSTEMD_SERVICES from config
@sonyafenge sonyafenge force-pushed the master-mizar-intergration-preparation-new branch from ffb21a5 to e35ca94 Compare February 16, 2022 22:08
@@ -129,6 +129,8 @@ spec:
- name: config-volume
mountPath: /etc/coredns
readOnly: true
- name: tmp
Copy link
Collaborator

@yb01 yb01 Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what file does coredns use for here ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the individual commit, i can assume this is for the logs.
this seem a bit hack especially this is for master code.
normally the pod app should be able to get its log out. we should leverage, or use the termination log if this for pod crash csaes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cherry-pick from community Kubernetes. kubernetes/kubernetes#82128
we do have this kind of error before this fix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info.

This seems to be temp workaround for an issue in klog in k8s release 1.14 and 1.15 as the change only short-lived in those two releases only, in 1.16 k8s didn't take this change.

we probably should not take it instead check how eventually the fix was addressed.

is this change helping us? the root issue is the coredns cannot access the api server and hence the crash-restart, if the root issue is addressed, we won't need this fix -- i.e. we can afford even a short restart of the Dns service for NOW.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least for now, it help us. once the root cause of cordons addressed, not sure what will happened once we remove this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we discussed, let's add some comment on this change for future update to coredns in arktos if resource is available.


# Override to latest versions of containerd and runc
systemctl stop containerd
if [[ ! -z "${UBUNTU_INSTALL_CONTAINERD_VERSION:-}" ]]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i recall we have an internal version of containerd for mizar CNI. can you double check with Hongwei on this if we need the internal version ?

Copy link
Collaborator

@yb01 yb01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good. the way to get the coredns log seems not meeting the master bar. please investigate.

Copy link
Collaborator

@yb01 yb01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm.

please annotate the coredns workaround so we can get a chance to track it and fix.

@sonyafenge
Copy link
Collaborator Author

For commit: “add workaround for klog issue causing CoreDNS to crash”, did the investigation below:

  1. Started a run using POC branch and remove this commit. Then we got dns error:
{"log":"E0218 22:16:26.720159       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:315: Failed to list *v1.Service: Get https://10.0.0.1:443/api/v1/services?limit=500\u0026resourceVersion=0: dial tcp 10.0.0.1:443: i/o timeout\n","stream":"stdout","time":"2022-02-18T22:16:26.720418911Z"}{"log":"E0218 22:16:26.720159       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:315: Failed to list *v1.Service: Get https://10.0.0.1:443/api/v1/services?limit=500\u0026resourceVersion=0: dial tcp 10.0.0.1:443: i/o timeout\n","stream":"stdout","time":"2022-02-18T22:16:26.720476197Z"}{"log":"log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-7778f74bb8-f77nl.unknownuser.log.ERROR.20220218-221626.1: no such file or directory\n","stream":"stdout","time":"2022-02-18T22:16:26.720483203Z"}
$ kubectl get pods -AT --kubeconfig=/home/sonyali/go/src/k8s.io/arktos/cluster/kubeconfig.tp-1 -owide | grep coredns
system   kube-system   coredns-7778f74bb8-f77nl                                  2464322247357229799   0/1     CrashLoopBackOff    7          42m   50.248.0.24   sonyaperf1-021822-rp-1-minion-group-7wbj   <none>           <none>
system   kube-system   coredns-default-sonyaperf1-021822-tp-1-7dcc7dbc99-fdtcd   267252184356384451    0/1     CrashLoopBackOff    7          41m   50.248.0.23   sonyaperf1-021822-rp-1-minion-group-5t8q   <none>           <none>
  1. Review the detail of this fix from community. It called “workaround” because this is a coredns issue and has been fixed by coredns 1.4+.
    Arktos are still using coredns:1.3.1. we need either pickup this fix or upgrade coredns to 1.4+. Community use this fix till coredns upgrade to 1.4+ when kubernetes 1.16+

  2. check community kubernetes upgrade cordons history and found these PR may be necessary:
    https://github.com/kubernetes/kubernetes/pull/78033
    https://github.com/kubernetes/kubernetes/pull/82093

  3. open arktos issue to track coredns upgrade:
    upgrade coredns to v1.4+ #1374

@zmn223
Copy link
Collaborator

zmn223 commented Feb 19, 2022

/lgtm

@zmn223
Copy link
Collaborator

zmn223 commented Feb 19, 2022

/approve

@centaurus-cloud-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yb01, zmn223

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@centaurus-cloud-bot centaurus-cloud-bot merged commit 30f1d70 into CentaurusInfra:master Feb 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants