-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubeadm and external etcd init error execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition #2612
Comments
I saw your comments earlier on another issue but i do not think you will get a better response on this new issue. Kubeadm as a client tries to get access the api server trough the lb but it fails. It could be a temp outage or a permanent problem. To test if the lb works you can deploy a test Go compiled app that serves as a dummy api server and see if curl can reach it trough the lb. This is not a kubeadm problem per se and the best place is to discuss on support channels. /kind support |
root@master01:~# systemctl status kubelet Nov 25 17:23:36 master01.fe.me kubelet[166036]: I1125 17:23:36.507642 166036 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume "usr-local-share-ca-certificates" (UniqueName: "kubernetes.io/h> |
I see some error in kubelet |
when you are seeing the error about "writing Criscoket.." in kubeadm, is the apiserver container running? |
Yes, it's running and still up root@master01: |
LB is fine |
worth looking the apiserver logs. if the component is running and the storage backend is working i don't see why also is init failing consistently or only sometimes? |
if i understand the setup correctly you have two control plane nodes managed by kubeadm and each has a external but co-located etcd member managed by systemd on the kubeadm nodes. is that right? having the co-located members on the same nodes as external is not advised unless you want to pass special config to etcd that kubeadm's local/stacked etcd does not support. also you need 3 control plane nodes and 3 etcd members for HA: |
if i understand the setup correctly you have two control plane nodes managed by kubeadm and each has a external but co-located etcd member managed by systemd on the kubeadm nodes. is that right? |
kubeadm init --config=cluster.yaml --upload-certs --v=10 I1125 18:09:50.453292 181098 round_trippers.go:454] GET https://192.168.1.88:6443/api/v1/nodes/master01.fe.me?timeout=10s 404 Not Found in 4 milliseconds |
master01.fe.me is the first master node I am initing |
2 etcd members cannot vote (in case of failure). --> I will try with 3 etcd |
I1125 18:09:50.453442 181098 request.go:1181] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"nodes "master01.fe.me" not found","reason":"NotFound","details":{"name":"master01.fe.me","kind":"nodes"},"code":404} master01.fe.me NotFound, have any wrong |
yes, because 2 members is not HA, if you want to have HA.
ok, but why external etcd on the same nodes?
the kubelet is responsible for creating the node object, so if it failed for some reason it would not be available in etcd and the kube-apiserver would not be able to perform actions on it such as patch. kubeadm requests to patch it....once kubeadm sees there is a /etc/kubernetes/kubelet.conf it assumes the node object is already created after TLS bootstrap. kubeadm will retry to patch the socket on the node object for a while and fail eventually. make sure you have matching kubelet, apiserver and kubeadm versions. |
i'd also experiment with removing:
from the kubeadm config and try to call if it also fails then it may be related to something in the kubelet / apiserver. |
I try to init with only one node and default config but still get fail. I1126 06:26:48.876863 296691 round_trippers.go:454] GET https://192.168.1.85:6443/api/v1/nodes/master01.fe.me?timeout=10s 404 Not Found in 1 milliseconds |
here are a couple of things to try:
this would create a single node cluster without an LB. if it passes then the problem is the LB...probably related to HTTPS forwarding or temporary blips (10s timeout for upload config). not a kubeadm issue.
this would skip the phase that is failing (upload config). more nodes will not be able to join this cluster because of the skipped important phase. if it passes the problem is in the kubelet, share the kubelet logs using use |
root@master01:~# kubeadm init --config=cluster2.yaml -v=10 I1126 14:48:06.011327 43988 round_trippers.go:454] GET https://192.168.1.85:6443/api/v1/nodes/master01.fe.me?timeout=10s 404 Not Found in 3 milliseconds |
root@master01:~# kubeadm init --config=cluster3.yaml -v=10 I1126 14:55:21.107157 47161 request.go:1181] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"nodes "master01.fe.me" not found","reason":"NotFound","details":{"name":"master01.fe.me","kind":"nodes"},"code":404} |
try this config for option 3 instead:
also share full kubelet logs as mentioned above. |
did kubeadm still throw errors with the config here? |
kubelet.log last error , let me use this option
|
Init in this option: Your Kubernetes control-plane has initialized successfully! what is the difference with others? |
root@master01:~# kubectl get node |
it skips the parts in kubeadm that need the Node object to be created.
are you following this guide for installing the packages? |
Yes, I was installed follow these steps |
in the kubelet logs i see some strange errors related to container sandboxes. i also see this:
this is an indication that the you can try to remove the then try
NOTE! you should use containerd or cri-o with a there are also errors related to the kubelet not being able to talk to as a side note, always make sure you call |
|
Hi Ivan, root@master01: |
In case I would like to set up multi-master node with external etcd. Could you please advise the init config |
In case I would like to set up multi-master node with external etcd. Could you please advise the init config |
I1127 12:19:21.354945 403780 round_trippers.go:454] GET https://192.168.1.88:6443/api/v1/nodes/master01.fe.me?timeout=10s 404 Not Found in 5 milliseconds |
If containerd works for you great...make sure you install it on all nodes.
Might be a good idea for us to update the troubleshooting guide about this
docker problem.
Our HA docs are here:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/
For additional questions please use the support forums:
https://github.com/kubernetes/kubeadm#support
/close
|
@neolit123: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I'm met the same issue now,the node object is not being registered |
Init in this option:
|
I believe my reason was becuase i used the --node-name $NODENAME which was adding my host name and and my hostname was uppercase which i think it may of not supported. |
Error execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition
init.log
root@master01:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.3 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
root@master01:~# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:47:19Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}
root@master01:
# etcd version#{"level":"info","ts":1637853104.719803,"caller":"etcdmain/etcd.go:72","msg":"Running: ","args":["etcd","version"]}
{"level":"warn","ts":1637853104.719844,"caller":"etcdmain/etcd.go:74","msg":"failed to verify flags","error":"'version' is not a valid flag"}
root@master01:
I plan to setup with below boxes
master01 --> 192.168.1.85
master02 --> 192.168.1.86
haproxy01, keepalived --> 192.168.1.87 , VIP --> 192.168.1.88
worker01 --> 192.168.1.90
worker02 --> 192.168.1.91
Install docker, kubelet kubeadm kubectl on all cluster node
HAProxy setup
root@haproxy01:~# cat /etc/haproxy/haproxy.cfg
global
...
defaults
...
frontend k8s_frontend
bind 192.168.1.88:6443
option tcplog
mode tcp
default_backend k8s_backend
backend k8s_backend
mode tcp
balance roundrobin
option tcp-check
server master01 192.168.1.85:6443 check fall 3 rise 2
server master02 192.168.1.86:6443 check fall 3 rise 2
Keepalived setup
root@haproxy01:~# cat /etc/keepalived/keepalived.conf
vrrp_script chk_haproxy { # Requires keepalived-1.1.13
script "killall -0 haproxy" # cheaper than pidof
interval 2 # check every 2 seconds
weight 2 # add 2 points of prio if OK
}
vrrp_instance VI_1 {
interface enp0s3
state MASTER
virtual_router_id 51
priority 100 # 101 on master, 100 on backup
virtual_ipaddress {
192.168.1.88 brd 192.168.1.255 dev enp0s3 label enp0s3:1
}
track_script {
chk_haproxy
}
}
Generating the TLS certificates
$ vim ca-config.json
{
"signing": {
"default": {
"expiry": "8760h"
},
"profiles": {
"kubernetes": {
"usages": ["signing", "key encipherment", "server auth", "client auth"],
"expiry": "8760h"
}
}
}
}
$ vim ca-csr.json
{
"CN": "Kubernetes",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "IE",
"L": "Cork",
"O": "Kubernetes",
"OU": "CA",
"ST": "Cork Co."
}
]
}
$ cfssl gencert -initca ca-csr.json | cfssljson -bare ca
$ vim kubernetes-csr.json
{
"CN": "kubernetes",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "IE",
"L": "Cork",
"O": "Kubernetes",
"OU": "Kubernetes",
"ST": "Cork Co."
}
]
}
cfssl gencert
-ca=ca.pem
-ca-key=ca-key.pem
-config=ca-config.json
-hostname=192.168.1.85,192.168.1.86,192.168.1.87,192.168.1.88,192.168.1.89,127.0.0.1,kubernetes.default
-profile=kubernetes kubernetes-csr.json |
cfssljson -bare kubernetes
scp ca.pem kubernetes.pem kubernetes-key.pem root@192.168.1.85:/etc/etcd/
scp ca.pem kubernetes.pem kubernetes-key.pem root@192.168.1.86:/etc/etcd/
Etcd on two masters setup
root@master01:~# cat /etc/systemd/system/etcd.service
[Unit]
Description=etcd
Documentation=https://github.com/coreos
[Service]
ExecStart=/usr/local/bin/etcd
--name 192.168.1.85
--cert-file=/etc/etcd/kubernetes.pem
--key-file=/etc/etcd/kubernetes-key.pem
--peer-cert-file=/etc/etcd/kubernetes.pem
--peer-key-file=/etc/etcd/kubernetes-key.pem
--trusted-ca-file=/etc/etcd/ca.pem
--peer-trusted-ca-file=/etc/etcd/ca.pem
--peer-client-cert-auth
--client-cert-auth
--initial-advertise-peer-urls https://192.168.1.85:2380
--listen-peer-urls https://192.168.1.85:2380
--listen-client-urls https://192.168.1.85:2379,http://127.0.0.1:2379
--advertise-client-urls https://192.168.1.85:2379
--initial-cluster-token etcd-cluster-1
--initial-cluster 192.168.1.85=https://192.168.1.85:2380,192.168.1.86=https://192.168.1.86:2380
--initial-cluster-state new
--data-dir=/var/lib/etcd
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
root@master02:~# cat /etc/systemd/system/etcd.service
[Unit]
Description=etcd
Documentation=https://github.com/coreos
[Service]
ExecStart=/usr/local/bin/etcd
--name 192.168.1.86
--cert-file=/etc/etcd/kubernetes.pem
--key-file=/etc/etcd/kubernetes-key.pem
--peer-cert-file=/etc/etcd/kubernetes.pem
--peer-key-file=/etc/etcd/kubernetes-key.pem
--trusted-ca-file=/etc/etcd/ca.pem
--peer-trusted-ca-file=/etc/etcd/ca.pem
--peer-client-cert-auth
--client-cert-auth
--initial-advertise-peer-urls https://192.168.1.86:2380
--listen-peer-urls https://192.168.1.86:2380
--listen-client-urls https://192.168.1.86:2379,http://127.0.0.1:2379
--advertise-client-urls https://192.168.1.86:2379
--initial-cluster-token etcd-cluster-0
--initial-cluster 192.168.1.85=https://192.168.1.85:2380,192.168.1.86=https://192.168.1.86:2380
--initial-cluster-state new
--data-dir=/var/lib/etcd
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
root@master01:~# ETCDCTL_API=3 etcdctl member list
8a137d8e3cb900a, started, 192.168.1.86, https://192.168.1.86:2380, https://192.168.1.86:2379, false
75b9a29ae5a417ae, started, 192.168.1.85, https://192.168.1.85:2380, https://192.168.1.85:2379, false
Prepare cluster config file
root@master01:~# cat cluster.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
etcd:
external:
caFile: /etc/etcd/ca.pem
certFile: /etc/etcd/kubernetes.pem
keyFile: /etc/etcd/kubernetes-key.pem
endpoints:
networking:
dnsDomain: cluster.local
podSubnet: 10.30.0.0/24
serviceSubnet: 10.96.0.0/12
kubernetesVersion: v1.22.4
controlPlaneEndpoint: 192.168.1.88:6443
apiServer:
timeoutForControlPlane: 4m0s
extraArgs:
authorization-mode: "RBAC"
certSANs:
controllerManager: {}
scheduler: {}
certificatesDir: /etc/kubernetes/pki
imageRepository: k8s.gcr.io
clusterName: kubernetes
dns: {}
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd
root@master01:~#
root@master01:~# kubeadm init --config=cluster.yaml
The text was updated successfully, but these errors were encountered: