Dynamically increase/decrease the etcd quota size and update the doc #984

ishan16696 · 2025-01-22T09:06:36Z

How to categorize this issue?

/area disaster-recovery
/kind enhancement

What would you like to be added:
It has been observed that when etcd exceeds its quota limit of 8GB, the typical solution is to perform compaction and defragmentation to decrease the etcd db size. However, these operations may not always be effective, especially if the etcd database contains numerous unique writes (unique keys), potentially due to a bug in the user's deployed workload.
To address this, consider dynamically adjusting the etcd quota size and document the steps required to recover from this scenario in our documentation.

Why is this needed:
In situations where etcd exceeds its quota size, operator need to usually have to baby sit that cluster The typical solution involves manually deleting some user resources directly from etcd, followed by running compaction and defragmentation command on etcd.
In this proposal, I suggest dynamically increasing the etcd quota size to stabilize the etcd, allowing users to deploy necessary fixes. Once the issue is resolved, the etcd quota size can be dynamically reduced back to its original limit.

gardener-robot added area/disaster-recovery Disaster recovery related kind/enhancement Enhancement, improvement, extension labels Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamically increase/decrease the etcd quota size and update the doc #984

Dynamically increase/decrease the etcd quota size and update the doc #984

ishan16696 commented Jan 22, 2025 •

edited

Loading

Dynamically increase/decrease the etcd quota size and update the doc #984

Dynamically increase/decrease the etcd quota size and update the doc #984

Comments

ishan16696 commented Jan 22, 2025 • edited Loading

ishan16696 commented Jan 22, 2025 •

edited

Loading