You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
in k8s, the resource and application is maintained in the same apiserver-etcd. the node status, hearbeats, node life cycle controller and scheduler connects to the same apiserver-etcd as the cluster state remains there. so it subtly keeps all controllers/scheduler will be in a "frozen" state when api server is down or disconnected -- the leader election will not find a leader for them.
this is important to avoid issues for controllers act when the cluster state is unknown.
in Arktos, app and resource are separated into two set of api server-etcd in the TP or RP clusters. scheduler uses the TP cluster for it leader election so as long as the TP is up, scheduler will remain functioning.
this will introduce an issue where when RP api server is disconnected, regardless the nodes are actually live or not. scheduler will continue place pods to the nodes managed by this disconnected or downed RP. which can cause unexpected situation for the placed pods depending on the node status.
the desired behavior should be the system will not place any pods for the RP is disconnected. the scheduler can still functioning to schedule pods to the other RPs it connects to ( in a multiple RP env ).
The text was updated successfully, but these errors were encountered:
yb01
changed the title
Scheduler should be in paused state when RP api server is down or disconnected
Scheduler should not place pods to the downed or disconnected RP
Apr 16, 2022
in k8s, the resource and application is maintained in the same apiserver-etcd. the node status, hearbeats, node life cycle controller and scheduler connects to the same apiserver-etcd as the cluster state remains there. so it subtly keeps all controllers/scheduler will be in a "frozen" state when api server is down or disconnected -- the leader election will not find a leader for them.
this is important to avoid issues for controllers act when the cluster state is unknown.
in Arktos, app and resource are separated into two set of api server-etcd in the TP or RP clusters. scheduler uses the TP cluster for it leader election so as long as the TP is up, scheduler will remain functioning.
this will introduce an issue where when RP api server is disconnected, regardless the nodes are actually live or not. scheduler will continue place pods to the nodes managed by this disconnected or downed RP. which can cause unexpected situation for the placed pods depending on the node status.
the desired behavior should be the system will not place any pods for the RP is disconnected. the scheduler can still functioning to schedule pods to the other RPs it connects to ( in a multiple RP env ).
The text was updated successfully, but these errors were encountered: