You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A huge amount of log messages is generated when an experiment is forced to terminate by the platform controller. This leads to several drawbacks:
It is harder to find the cause of the termination in the logs because it is filled with a lot of exceptions (e.g., RabbitMQ exceptions); even the controller reports errors that it caused by its own behavior
The ELK stack is forced to process all these log messages.
The majority of these messages could be avoided by changing the behavior of the platform controller.
Reproducability
Start an experiment. Terminate it. Check the logs.
Expected behavior
The following changes should be implemented:
Mark an experiment as forced to stop. This allows the controller to check whether it makes sense to send additional messages (e.g., the message that a container stopped) to the command queue. The other containers do not have to be informed about the termination since they will be terminated as well.
Terminate containers in the right order. At the moment, it seems like the platform controller starts with the top element of the tree of containers of an experiment. However, in many cases, this container contains the RabbitMQ message broker of the experiment. This leads to a lot of exceptions in all connected containers, which can be easily avoided by changing the order of termination.
The text was updated successfully, but these errors were encountered:
Description
A huge amount of log messages is generated when an experiment is forced to terminate by the platform controller. This leads to several drawbacks:
The majority of these messages could be avoided by changing the behavior of the platform controller.
Reproducability
Start an experiment. Terminate it. Check the logs.
Expected behavior
The following changes should be implemented:
The text was updated successfully, but these errors were encountered: