Endpoint status and snapshot API doesn't work after the quorum loss and live etcd instances restarted #18511
Unanswered
zhuchenwang
asked this question in
Q&A
Replies: 1 comment 2 replies
-
Hi @zhuchenwang - Thanks for your question. Please refer to https://etcd.io/docs/v3.5/op-guide/recovery/#snapshotting-the-keyspace for guidance on etcd disaster recovery. Specifically:
In situations were the snapshot API is not serving you can copy the db files directly for subsequent use to restore the cluster as covered in the guide. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi team,
We deploy our own Etcd cluster as statefulset in the k8s cluster. If there is a quorum loss, we spin up a recovery job which calls the endpoint status API of the live Etcd pods to get the highest index, and call the snapshot API to that pod to take the snapshot and then restore from the snapshot.
This works well until we found that after the quorum loss and the remaining live Etcd pods were restarted, the endpoint status API was not available because the Etcd pod wanted to push it's local configuration but there was no quorum. The snapshot API also didn't work.
Is this an expected behavior? Also, any suggestion to the recovery process is appreciated.
Thanks,
Zhuchen
Beta Was this translation helpful? Give feedback.
All reactions