SOS aggregator needs more controlled shutdown #42

khuck · 2018-04-14T00:03:43Z

We need some kind of protocol to handle the case when an application has told a listener to shutdown, so the listener tells the aggregator to shutdown, but the analysis back end is still wanting data. The client needs a way to check if the aggregator is shutting down, and if so, tell the aggregator to wait until the client gives it the OK. Something like

application client (producer) sends shutdown/countdown message
listener/aggregator begins a countdown of X seconds. If the client doesn't cancel within X seconds, it shuts down.
add a "heartbeat" request to the client API, so that the client can periodically check to see if the aggregator is in the "countdown" process. If so, the client has the ability to "cancel" that shutdown with another message.
listener/aggregator cancels shutdown countdown if producer client requested it, but the consumer client isn't done yet.
after client is done ingesting available data, it is then required to send a new shutdown message to the aggregator.
aggregator can then really shut down.

cdwdirect · 2018-06-07T15:15:54Z

A simple version of this can be done today with the existing feedback / event trigger system, but I agree a thorough review of shutdown protocols is worth doing so we design it to handle the complex workflow cases we're seeing out there, slow filesystems, MPI runtimes that terminate everything when a fork'ed() child exits, etc.

khuck added enhancement requirement next version high priority bug/feature requiring immediate attention labels Apr 14, 2018

khuck assigned cdwdirect Apr 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SOS aggregator needs more controlled shutdown #42

SOS aggregator needs more controlled shutdown #42

khuck commented Apr 14, 2018

cdwdirect commented Jun 7, 2018

SOS aggregator needs more controlled shutdown #42

SOS aggregator needs more controlled shutdown #42

Comments

khuck commented Apr 14, 2018

cdwdirect commented Jun 7, 2018