Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catching traps #206

Open
MikeInnes opened this issue Apr 21, 2022 · 13 comments
Open

Catching traps #206

MikeInnes opened this issue Apr 21, 2022 · 13 comments

Comments

@MikeInnes
Copy link

MikeInnes commented Apr 21, 2022

In general this proposal looks great. However, I don't understand why traps can't be caught. The proposal states:

The rationale for this is that in general traps are not locally recoverable and are not needed to be handled in local scopes like try-catch.

But surely the idea of exceptions is to provide non-local error recovery. Meanwhile, traps can be handled by JS (in the browser), so they are not treated as fundamentally unrecoverable elsewhere.

True catch-alls are not that common, but important examples could include:

  • a REPL or JIT which compiles and runs user code on the fly;
  • an Erlang-like process model where tasks may crash entirely, but a supervisor can reboot as needed.

In these cases errors should unwind the stack, running catch blocks for destructors and cleanup, so that the program can continue from a sensible state. You'd also want to catch and display the stack trace, or even launch a debug UI. If traps can't do this, we must instead crash the application and ask the user to restart.

I can think of three broad workarounds in those cases:

  • Have the compiler turn traps into exceptions, by bounds checking pointer loads and so on. Aside from any performance cost, though, this wouldn't help foreign calls to separately-compiled code.
  • Use a JS exception handler when a true catch-all is needed. However, this complicates implementation (some kind of shadow stack would be needed so the supervisor can run destructors) and precludes supporting WASI-only runtimes, where a trap would still have to crash the process.
  • The above but with memory isolation, so that if a task fails we just throw out its instance. But this is a worst of both worlds (performance hit and no WASI) that also doesn't account for external resources (eg GPU memory handles).

There's an example of using a JS handler in the wat2wasm demo, though it's of course simpler for not having to handle cleanup or shared memory.

So, why are traps a special case? I'd prefer to be able to catch them, but if there's some deeper reason why that's not possible, perhaps that rationale could be made clearer.

@tlively
Copy link
Member

tlively commented Apr 21, 2022

This was discussed at length in the CG's last in-person meeting almost three years ago and also here on issues. Briefly reviewing the discussions, it looks like there were some back-compat concerns and also code size concerns.

@MikeInnes
Copy link
Author

MikeInnes commented Apr 22, 2022

Thanks a lot for those links, that's really useful.

To paraphrase the discussion there, it sounds like the most important reason is that C++ needs abort semantics where execution ends immediately, without destructors or other cleanup – anything else would be surprising and potentially make debugging harder. While it's possible to have cleanup blocks check for traps and rethrow, this would add several bytes to a lot of function calls.

As I understand it this doesn't rule out catching traps per se – it just means they might be handled later, by a separate proposal. It seems feasible to have a catch_all (or finally) variant that includes traps, for example.

Edit: thinking about it, it's possibly better to opt in to traps-as-exceptions per call site (eg a variant divide instruction), rather than use a true catch-all. That's what the check-and-throw pattern emulates, and that way you're asserting that the error at that location can be safely handled, not that your handler can recover from any error in any module. You'd still need some kind of inter-module trap handling as has been discussed elsewhere.

@rossberg
Copy link
Member

The deeper reason I think was that we regard traps a different class of event. Unlike exceptions, traps are a fatal failure that should never happen in correct code. That is, they are indication of a bug not in the application but in the code generator, e.g., the compiler that produced the Wasm binary. If such an event is encountered, the program generally is in an inconsistent state, and any attempts to run further code, such as handlers or finalisers, are inherently dangerous.

It is true, though, that there are scenarios where supervisor code might want to recover from traps. But currently, this would happen at a different (host) level anyway, so it can use external capabilities or API for catching traps (and then potentially forward to other Wasm code). A pure Wasm-hosts-Wasm scenario, where traps necessarily have to be caught in Wasm itself, currently lacks many other capabilities, such as in-Wasm module instantiation. I imagine that a comprehensive future Wasm-internal meta API would include the ability to recover from traps somehow.

@MikeInnes
Copy link
Author

Excellent, thanks for the additional insight. I agree that traps are fatal at the task level, and signal a bug somewhere.

[traps] are indication of a bug not in the application but in the code generator, e.g., the compiler that produced the Wasm binary.

But surely this conflicts with how traps are used today? eg emcc compiles x/0, bad pointers and invalid function references into code that traps. As I understand it this is a feature (for the reasons above), not a compiler bug – the fault is in the application.

Code generators could only rule out all application traps by turning them into exceptions. If everyone did this, it'd be equivalent to making traps catchable, with more overhead. At any rate that isn't the situation today: most seem to compilers treat traps as a valid output for a faulty application. Am I misunderstanding?

@tlively
Copy link
Member

tlively commented Apr 28, 2022

For a language without undefined behavior, a trap would typically signify a code generator bug. For a language with undefined behavior (like division by 0 in C++), a trap is also a convenient way for the language to throw up its hands and give up trying to produce meaningful results as quickly as possible. In both cases, it is not reasonable to expect traps to be caught. If the trap is due to a compiler bug, the only reasonable response is to fix or work around that bug. If the trap is due to undefined behavior, then the program needs to be fixed to not have UB.

@rossberg
Copy link
Member

rossberg commented Apr 28, 2022

What @tlively said. In addition, the occurrence of UB in a language like C/C++ actually means that all past(!) and future behaviour of the same program execution is undefined as well, so that recovering isn't even a meaningful operation.

@titzer
Copy link
Contributor

titzer commented Apr 28, 2022

Languages that have well-defined behavior on e.g., division by zero or access of a null pointer (in the GC proposal), will want to catch traps and generate source-level exceptions. Otherwise, they will not be performance-competitive with their native implementations. I think we need a solution for this, which could take the form of a thread-local or module-defined trap handler.

@MikeInnes
Copy link
Author

For a language without undefined behavior, a trap would typically signify a code generator bug.

Ok I see, that clarifies things. Thanks again all for taking the time to explain.

I think it's helpful to be explicit that statements like "traps typically signify UB or codegen bugs" or "it's not reasonable to catch traps intra-module" are true by definition, because that's how traps are intended to be used. Which errors are actually treated as unrecoverable is a independent, source language question. x/0 (the wasm instruction) isn't allowed in a correctly compiled, well-defined program, but you don't mean to imply that x/0 (the source operation) isn't supported, or that it shouldn't ever be recovered. Conversely, it's possible for a compiler to use traps for recoverable errors, it just shouldn't.

In other words, "traps signify fatal errors" is prescriptive, not descriptive. I think that was my main point of confusion.

To summarise in terms of the original issue: languages that don't want x/0 and co to abort the module should override that default with a branch. In future (I speculate) we might get options that reduce the size/performance overhead from this workaround, like instruction variants that throw or return a default instead of trapping.

This is the first workaround I listed above, where I mentioned that it doesn't deal with foreign calls. But a foreign module that traps is telling you it's FUBARed, so to recover you'll usually need to dump it and reinitialise. That implies host functionality – hence why trap recovery is tied either to the JS API or a future WASM-hosts-WASM proposal.

@dschuff
Copy link
Member

dschuff commented Apr 29, 2022

That seems like a reasonable interpretation. I agree with @titzer (and you, if I understand you correctly) that we should eventually have a way to handle things like division by zero or NPEs. Whether that mechanism is turning traps into something observable inside wasm or something else (e.g. throwing versions of some instructions) is something that we'd want to discuss.

@rossberg
Copy link
Member

rossberg commented May 2, 2022

@titzer, @dschuff, I seem to recall that we converged on throwing versions of the relevant instructions for that purpose during the f2f meeting. That's certainly cleaner and simpler than a complex new mechanism.

@MikeInnes
Copy link
Author

Yes, I think we're on the same page, and I'm glad to see that throwing variants are on the table.

FWIW, some languages may also want to turn traps into values, rather than exceptions. For example Pony defines x/0 = 0. That's pretty unusual but may be worth considering nonetheless.

@daxpedda
Copy link

I've stumbled upon similar issues that are being discussed here but couldn't really find a satisfying answer anywhere.

As far as I understand traps are not supposed to be caught and should abort the module or prevent future execution. But currently in browsers, traps don't do that, execution is aborted, but the module is not invalidated and execution can be resumed anytime, for example by event listeners that were registered before.

Is this a problem the exception handling spec doesn't deal with? The only wording I could find is here (link):

If the call stack is exhausted without any enclosing try blocks, the embedder defines how to handle the uncaught exception.

Which is about exceptions and not traps.

I would appreciate if somebody could point me in the right direction if I'm wrong here.

@titzer
Copy link
Contributor

titzer commented Jan 24, 2023

The more I think about this, the more I prefer a module-local trap-handler mechanism, as opposed to throwing variants of instructions. Some modules are uncooperative and may not use the throwing variants, and yet a caller module might want to catch traps that that module generated by failed to handle. A good use case is a test harness that runs "code-under-test". Another good use case is a crash handler.

Throwing variants of instructions scales poorly because every new instruction that generates traps gets a second variant. And most of the GC bytecodes can trap (e.g. on null).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants