Dynamically increase/decrease the etcd quota size and update the doc #984
Labels
area/disaster-recovery
Disaster recovery related
kind/enhancement
Enhancement, improvement, extension
How to categorize this issue?
/area disaster-recovery
/kind enhancement
What would you like to be added:
It has been observed that when etcd exceeds its quota limit of 8GB, the typical solution is to perform compaction and defragmentation to decrease the etcd db size. However, these operations may not always be effective, especially if the etcd database contains numerous unique writes (unique keys), potentially due to a bug in the user's deployed workload.
To address this, consider dynamically adjusting the etcd quota size and document the steps required to recover from this scenario in our documentation.
Why is this needed:
In situations where etcd exceeds its quota size, operator need to usually have to baby sit that cluster The typical solution involves manually deleting some user resources directly from etcd, followed by running compaction and defragmentation command on etcd.
In this proposal, I suggest dynamically increasing the etcd quota size to stabilize the etcd, allowing users to deploy necessary fixes. Once the issue is resolved, the etcd quota size can be dynamically reduced back to its original limit.
The text was updated successfully, but these errors were encountered: