Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: setup signal handlers in interpreted code #14766

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

ysbaddaden
Copy link
Contributor

@ysbaddaden ysbaddaden commented Jun 29, 2024

The interpreter won't receive signals anymore, since the interpreted code will receive them, but the interpreter's signal loop fiber won't be resumed and it won't handle signals anymore.

Fixes the default handler for SIGCHLD into the interpreted code, so Process.run now works as expected and don't hang forever.

closes #12241
Alternative to / closes #14019

TODO:

  • invalid memory access segfaults (signal 11) in some specs
  • investigate why we get these segfaults

I already marked some specs as pending (mostly those about segfaults), but there seems to be another one failing, or maybe it's a more common occurrence.

  • segfault to run std spec when crystal has been compiled with --release

Maybe the new segfault is related to the support of fork in the interpreter?

The interpreter won't receive signals anymore, since the interpreted
code will receive them, but the interpreter's signal loop fiber won't be
resumed and it won't handle signals anymore.

Unblocks the default handler for SIGCHLD into the interpreted code, so
Process.run and waiting for a sub-process works just as expected.
@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Jun 30, 2024

Hypothesis A: the signal handler is running on the current stack, and that's messing with the interpreter. Maybe registering an alternate stack with sigaltstack() and pass SA_ONSTACK to sigaction.

Hypothesis B: the signal handler is replaced by interpreted code, which may do things that aren't thread safe, for example call into GC from the signal handler (not the signal loop). We might want the interpreted code to change the writer fd of the interpreter, so the signal handler runs in the interpreter but will be received by the interpreted code.

The interpreted code cannot register signal handlers because it will run
interpreted code that isn't safe. Even registering SIG_IGN must happen
in the interpreted, otherwise the handler tries to run some interpreted
and segfaults trying to allocate memory.

Introduces a couple interpreter intrinsics to:

2. register the signals to the OS, so the process can receive them;
1. replace the interpreter's signal writer to write to the interpreted
   code's signal loop;

C signal handlers will thus receive the signals and handle them in the
interpreter (minimal, non interpreted code, that writes to a file
descriptor) while the crystal signal handlers will be interpreted
normally.
We can reenable a few signal specs, but a couple ones to detect or
handle segfaults take forever to complete and have been disabled when
interpreted.
@ysbaddaden
Copy link
Contributor Author

Hypothesis B was the issue.

We needed a couple interpreter intrinsics after all. The interpreter is registering the handlers (can't run interpreted code in C signal handlers), so it receives the signals and forwards them to the interpreted signal pipe, and the signal loop of the interpreted code will invoke the interpreted handlers 👍

Note: the interpreter won't handle signals anymore, but it already didn't handle them after starting to interpret code, since the interpreter bypasses the fiber scheduler (it directly resumes fibers) so the signal loop fiber was never resumed.

@ysbaddaden
Copy link
Contributor Author

With this patch, I can run the full minitest.cr test suite using the interpreter ❤️

@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Jun 30, 2024

Running the std specs with the interpreter was painfully slow, so I compiled crystal in release mode and then tried to run the specs, but it segfaulted after a bunch of forks, and before reporting any progress. I tried running:

$ SPEC_SPLIT="3%4" bin/crystal i spec/std_spec.cr

A gdb session shows that the interpreter eventually segfaults:

Program received signal SIGSEGV, Segmentation fault.
0x0000555555e2181e in interpret ()
    at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/interpreter/interpreter.cr:354
354           {% begin %}
(gdb) bt
#0  0x0000555555e2181e in interpret ()
    at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/interpreter/interpreter.cr:354
#1  0x0000555555c4a753 in interpret ()
    at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/interpreter/interpreter.cr:252
#2  0x00005555561ca274 in interpret ()
    at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/interpreter/repl.cr:92
#3  interpret_and_exit_on_error ()
    at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/interpreter/repl.cr:96
#4  0x00005555561c6587 in run_file ()
    at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/interpreter/repl.cr:65
#5  repl () at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/command/repl.cr:47
#6  0x0000555556126ea4 in run () at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/command.cr:104
#7  0x000055555574671c in run () at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/command.cr:55
#8  run () at /home/julien/work/crystal-lang/crystal/src/compiler/crystal/command.cr:54
#9  __crystal_main () at /home/julien/work/crystal-lang/crystal/src/compiler/crystal.cr:11
#10 0x000055555574cf59 in main_user_code () at /home/julien/work/crystal-lang/crystal/src/crystal/main.cr:118
#11 main () at /home/julien/work/crystal-lang/crystal/src/crystal/main.cr:104
#12 main () at /home/julien/work/crystal-lang/crystal/src/crystal/main.cr:130

@ysbaddaden
Copy link
Contributor Author

Interestingly the specs that failed when calling out to crystal i can still fail when calling out crystal build on CI, meaning the interpretation ain't the issue, but something else (subprocess std in/out/err communication?) 🤔

@ysbaddaden
Copy link
Contributor Author

I think the problem, now, is support for fork.

It's likely unsafe to fork/exec in the interpreted code, and we should fork and exec from the interpreter directly... or maybe we should use posix_spawn instead for interpreted code on UNIX? Or we should have an intrinsic for spawning a process in the interpreter (i.e. call fork/exec out of the interpreted code) 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Interpreter: Process#run broken with defunct child process
1 participant