constrain signal delivery to Scheme to the main thread #813
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change is an attempt to address #809.
My guess here is that the test crashes when signal 14 is delivered to a thread that is not a Scheme thread. In particular, if a GC has happened previously while multiple threads were running, then GC worker threads may be waiting around. They haven't masked signals, and the thread-local variable for a thread context will be NULL in those worker threads.
In that scenario, I don't think the signal would get delivered to a non-main thread on macOS, because
sigprocmask
and the automatic masking of signals during delivery is process-wide there. On Linux, however, the signal mask is thread-specific. That difference would explain why I see the crash rarely and (in retrospect, as far as I can remember) only when trying out different Linux systems and not when working in my main macOS development environment.Before this change, I can force a crash by ensuring that GC threads have been created, disabling signals in the main thread in Linux, and then having a handled signal delivered to the Scheme process. So, maybe the test was sometimes crashing when the main thread happens to have signals blocked in the main thread when signal 14 is delivered to the process. That particular problem could not have happened before parallelism was added to the GC, and so maybe it would never have happened in the test before. In practice, there are often extra threads running in a process, and that's why this patch adjusts the signal handler to redirect to the main thread instead of setting the signal mask in GC worker threads.