You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
we're dealing with the issue of concurrent reconcilations when trinocluster resources change. this issue occurs e.g. when a catalog is applied to the cluster matching more than one catalog-matchlabel or when all trino cluster resources are changed at the same time because they are configured in custom helm wrappers.
since we use argo for continous deployments we are not able to change clusters / upsert catalogs subsequently in a manual way.
we did not make progress with trino-lb (#490) yet but I'm sure even with trino-lb running this would cause outages everytime the trinocluster resources are (re-)configured or catalogs are upserted. unfortunately running trino in a high available way is mission critical for our production scenario
possible solution: subsequent reconcilation
introducing a flag for the operator (maybe other product operators might be affecated as well) which enables subsequent reconcilations in a queue style instead of parallelized reconcilations which lead to all clusters going offline at the same time.
disadvantage might be that a malicious cluster kills the whole reconcilation process until the resource is fixed manually.
possible solution: pdb
we already defined following pdb to make sure one coordinator per kubernetes cluster is available. unfortunately the pdb is ignored and all coordinators get killed concurrently. @maltesander@sbernauer already told about delete operations instead of evictions which would take care of the pdb. feel free to edit / add some further details
we're dealing with the issue of concurrent reconcilations when trinocluster resources change. this issue occurs e.g. when a catalog is applied to the cluster matching more than one catalog-matchlabel or when all trino cluster resources are changed at the same time because they are configured in custom helm wrappers.
since we use argo for continous deployments we are not able to change clusters / upsert catalogs subsequently in a manual way.
we did not make progress with trino-lb (#490) yet but I'm sure even with trino-lb running this would cause outages everytime the trinocluster resources are (re-)configured or catalogs are upserted. unfortunately running trino in a high available way is mission critical for our production scenario
possible solution: subsequent reconcilation
introducing a flag for the operator (maybe other product operators might be affecated as well) which enables subsequent reconcilations in a queue style instead of parallelized reconcilations which lead to all clusters going offline at the same time.
disadvantage might be that a malicious cluster kills the whole reconcilation process until the resource is fixed manually.
possible solution: pdb
we already defined following pdb to make sure one coordinator per kubernetes cluster is available. unfortunately the pdb is ignored and all coordinators get killed concurrently. @maltesander @sbernauer already told about delete operations instead of evictions which would take care of the pdb. feel free to edit / add some further details
Seems like somebody is feeling similar pain with elasticsearch kubernetes/kubernetes#91808 (comment)
The text was updated successfully, but these errors were encountered: