Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

constrain signal delivery to Scheme to the main thread #813

Merged
merged 1 commit into from
Mar 2, 2024

Conversation

mflatt
Copy link
Contributor

@mflatt mflatt commented Feb 29, 2024

This change is an attempt to address #809.

My guess here is that the test crashes when signal 14 is delivered to a thread that is not a Scheme thread. In particular, if a GC has happened previously while multiple threads were running, then GC worker threads may be waiting around. They haven't masked signals, and the thread-local variable for a thread context will be NULL in those worker threads.

In that scenario, I don't think the signal would get delivered to a non-main thread on macOS, because sigprocmask and the automatic masking of signals during delivery is process-wide there. On Linux, however, the signal mask is thread-specific. That difference would explain why I see the crash rarely and (in retrospect, as far as I can remember) only when trying out different Linux systems and not when working in my main macOS development environment.

Before this change, I can force a crash by ensuring that GC threads have been created, disabling signals in the main thread in Linux, and then having a handled signal delivered to the Scheme process. So, maybe the test was sometimes crashing when the main thread happens to have signals blocked in the main thread when signal 14 is delivered to the process. That particular problem could not have happened before parallelism was added to the GC, and so maybe it would never have happened in the test before. In practice, there are often extra threads running in a process, and that's why this patch adjusts the signal handler to redirect to the main thread instead of setting the signal mask in GC worker threads.

The intent is to avoid crashes when a signal gets delimited to a
thread that might not even be a Scheme thread. Also, we don't try to
queue the event directly in the main thread's context, because then
we'd need more of a lock (while signal handling is otherwise an
implicit lock).
Copy link
Contributor

@burgerrg burgerrg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a good find!

@mflatt mflatt merged commit fc081fc into cisco:main Mar 2, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants