Track control plane join race conditions via kubeadm #2050
Labels
kind/bug
Categorizes issue or PR as related to a bug.
priority/backlog
Higher priority than priority/awaiting-more-evidence.
Kubeadm has a bug with joining multiple control plane nodes simultaneously. Occasionally, the control plane will fail to join because not enough etcd members are ready.
We can work around this by setting a concurrency limit of 1 to the controller that is responsible for joining the control plane nodes (usually the infrastructure controller). This fixes the issue by making joining a node is a blocking operation so only one node can join at a time. If users set a concurrency limit > 1 then there is about a 20% chance failure of a control plane join failing when two or more control planes attempt to join a cluster simultaneously.
CAPI is able to work around this by retrying the join after failure, but there will be ominous logs and a slow down in how long it takes all control plane nodes to become ready.
Let's track the work kubernetes/kubeadm#1793 that will improve simultaneous control plane joining.
/kind bug
/milestone Next
/priority backlog
The text was updated successfully, but these errors were encountered: