Skip to content

Proposed enhancements to GNU Parallel

martinda edited this page Apr 16, 2015 · 3 revisions

Enhancement 1: Do no kill the whole the process tree when timing out

When the --timeout val value expires, GNU parallel sends signals to the entire process tree using the following termination sequence: SIGTERM, wait 200ms, SIGTERM, wait 200ms, SIGKILL

In some cases you might want to cooperatively dismantle the process tree in an orderly fashion, from the child-most processes to the parent-most processes, giving each sub-process along the way a chance to clean up, write reports, etc. To do this, the signals of the termination sequence need to be propagated from the parent processes downwards to the children processes and so on, until there is not more children to receive the signals. As the children processes terminate, the process tree is orderly dismantled from the bottom up. This termination process is documented as the WUE and WCE methods. Since GNU parallel broadcasts the termination sequence to the entire process tree at once, an option is needed to turn this broadcast into a targeted operation that signals only the top processes of the process tree.

I propose to add the --limit-term-seq-to-parent option to GNU Parallel. When this option is set, GNU Parallel only sends the termination sequence to the parent of each process tree it manages. It is the responsibility of the parent processes to propagate the termination sequence to their children processes. However, when a process in the process tree is stuck and does not respond, we do not want to lock up GNU parallel. Consequently if there are still processes running in the process tree once GNU parallel has applied the termination sequence to the parents, GNU parallel will fall back to the default behaviour and broadcast the termination sequence to the entire process tree.

TODO

  • deal with --halt in the same way?
  • provide option to change the termination sequence
  • provide option to propagate the termination sequence when GNU parallel receives the TERM signal (so as to interrupt the running jobs)