Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOS aggregator needs more controlled shutdown #42

Open
khuck opened this issue Apr 14, 2018 · 1 comment
Open

SOS aggregator needs more controlled shutdown #42

khuck opened this issue Apr 14, 2018 · 1 comment
Assignees
Labels
enhancement high priority bug/feature requiring immediate attention next version requirement

Comments

@khuck
Copy link
Collaborator

khuck commented Apr 14, 2018

We need some kind of protocol to handle the case when an application has told a listener to shutdown, so the listener tells the aggregator to shutdown, but the analysis back end is still wanting data. The client needs a way to check if the aggregator is shutting down, and if so, tell the aggregator to wait until the client gives it the OK. Something like

  1. application client (producer) sends shutdown/countdown message
  2. listener/aggregator begins a countdown of X seconds. If the client doesn't cancel within X seconds, it shuts down.
  3. add a "heartbeat" request to the client API, so that the client can periodically check to see if the aggregator is in the "countdown" process. If so, the client has the ability to "cancel" that shutdown with another message.
  4. listener/aggregator cancels shutdown countdown if producer client requested it, but the consumer client isn't done yet.
  5. after client is done ingesting available data, it is then required to send a new shutdown message to the aggregator.
  6. aggregator can then really shut down.
@khuck khuck added enhancement requirement next version high priority bug/feature requiring immediate attention labels Apr 14, 2018
@cdwdirect
Copy link
Owner

A simple version of this can be done today with the existing feedback / event trigger system, but I agree a thorough review of shutdown protocols is worth doing so we design it to handle the complex workflow cases we're seeing out there, slow filesystems, MPI runtimes that terminate everything when a fork'ed() child exits, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement high priority bug/feature requiring immediate attention next version requirement
Projects
None yet
Development

No branches or pull requests

2 participants