Scheduler should not place pods to the downed or disconnected RP #1420

yb01 · 2022-04-16T17:59:09Z

in k8s, the resource and application is maintained in the same apiserver-etcd. the node status, hearbeats, node life cycle controller and scheduler connects to the same apiserver-etcd as the cluster state remains there. so it subtly keeps all controllers/scheduler will be in a "frozen" state when api server is down or disconnected -- the leader election will not find a leader for them.

this is important to avoid issues for controllers act when the cluster state is unknown.

in Arktos, app and resource are separated into two set of api server-etcd in the TP or RP clusters. scheduler uses the TP cluster for it leader election so as long as the TP is up, scheduler will remain functioning.

this will introduce an issue where when RP api server is disconnected, regardless the nodes are actually live or not. scheduler will continue place pods to the nodes managed by this disconnected or downed RP. which can cause unexpected situation for the placed pods depending on the node status.

the desired behavior should be the system will not place any pods for the RP is disconnected. the scheduler can still functioning to schedule pods to the other RPs it connects to ( in a multiple RP env ).

yb01 changed the title ~~Scheduler should be in paused state when RP api server is down or disconnected~~ Scheduler should not place pods to the downed or disconnected RP Apr 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler should not place pods to the downed or disconnected RP #1420

Scheduler should not place pods to the downed or disconnected RP #1420

yb01 commented Apr 16, 2022

Scheduler should not place pods to the downed or disconnected RP #1420

Scheduler should not place pods to the downed or disconnected RP #1420

Comments

yb01 commented Apr 16, 2022