You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a change require to address a problem of the Policy Server.
Currently, if one of the defined policies cannot be loaded (maybe it cannot be downloaded from the remote registry, maybe the user provided settings that are not valid,...) the Policy Server process exits with an error.
Inside of Kubernetes, the Pod running the Policy Server will be restarted a bunch of time and then it will be left in crash loop state. The only way to recover from this situation is to have someone look into the error message of the Policy Server Pod and fix the issue.
This behavior is dangerous. When rolling up a new policy (or making any change to the existing ones), the new Policy Server Pods could be end up in this broken state. The old ones, still running with the old working configuration, are going to disappear if something happens to the node where they are scheduled.
Because of that, it's possible to end up with a broken cluster: all the incoming admission requests are rejected because there are no working instances of Policy Server.
Solution you'd like
Instead of exiting with an error, the Policy Server should boot regularly, but Kubewarden should report back to the user that the Policy the error status.
Currently, the controller is in charge of changing the status of a Policy to Active and configuring the webhook. Instead, we should wait for the PolicyServer to report back that the policy was initialized successfully or that the initialization failed and act accordingly.
In the scenario of more than one PolicyServer replica, the aggregated status should be considered, similar to what Kubernetes does for replicas/readyReplicas of a deployment.
A possible implementation involves the PolicyServer updating the Policy CRD status fields directly and adding the error statuses to the Policy status state machine.
Alternatives you've considered
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered:
@flavio This is and important Spike I believe needs priority to make kubewarden policy server more resilient and better product. We are wondering when this can be picked up to have some progress on it ?
Is your feature request related to a problem?
This is a change require to address a problem of the Policy Server.
Currently, if one of the defined policies cannot be loaded (maybe it cannot be downloaded from the remote registry, maybe the user provided settings that are not valid,...) the Policy Server process exits with an error.
Inside of Kubernetes, the Pod running the Policy Server will be restarted a bunch of time and then it will be left in crash loop state. The only way to recover from this situation is to have someone look into the error message of the Policy Server Pod and fix the issue.
This behavior is dangerous. When rolling up a new policy (or making any change to the existing ones), the new Policy Server Pods could be end up in this broken state. The old ones, still running with the old working configuration, are going to disappear if something happens to the node where they are scheduled.
Because of that, it's possible to end up with a broken cluster: all the incoming admission requests are rejected because there are no working instances of Policy Server.
Solution you'd like
Instead of exiting with an error, the Policy Server should boot regularly, but Kubewarden should report back to the user that the Policy the error status.
Currently, the controller is in charge of changing the status of a Policy to
Active
and configuring the webhook. Instead, we should wait for the PolicyServer to report back that the policy was initialized successfully or that the initialization failed and act accordingly.In the scenario of more than one PolicyServer replica, the aggregated status should be considered, similar to what Kubernetes does for replicas/readyReplicas of a deployment.
A possible implementation involves the PolicyServer updating the Policy CRD status fields directly and adding the error statuses to the Policy status state machine.
Alternatives you've considered
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: