Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[web]: discrepancies between JS and Wasm backends with -O3 or higher. #56949

Open
sigmundch opened this issue Oct 23, 2024 · 19 comments
Open

[web]: discrepancies between JS and Wasm backends with -O3 or higher. #56949

sigmundch opened this issue Oct 23, 2024 · 19 comments
Labels
area-dart2wasm Issues for the dart2wasm compiler. area-web Use area-web for Dart web related issues, including the DDC and dart2js compilers and JS interop. web-dart2js

Comments

@sigmundch
Copy link
Member

Unsafe optimizations levels like -O3 and -O4 in dart2js and dart2wasm don't promise any guarantees around semantic behavior if assumptions are wrong. However, they currently take different routes and can cause friction for developers that support both JS and WASM outputs or that are migrating from one to the other.

Take this minimal example:

void main () {
  try {
    final Map<String, dynamic> map = {};
    int value = map['key']; // implicit null check
    print(value); // (1)
  } catch (e) {
    print('value was null'); // (2)
  }
}

Here:

  • The implicit cast and null check on the assignment to int value fails in -O2 and lower for both compilers, and gets caught by the try-catch block (2).
  • When using -O4, dart2js will print null on (1) instead.
  • When using -O4, dart2wasm will hit a wasm trap and exit with a runtime error that is not caught in (2).

Unfortuantely, flutter tools default to using -O4, and it is not obvious to developers that the discrepancy in semantics is coming from unsafe optimizations.

@sigmundch sigmundch added web-dart2js area-web Use area-web for Dart web related issues, including the DDC and dart2js compilers and JS interop. area-dart2wasm Issues for the dart2wasm compiler. labels Oct 23, 2024
@osa1
Copy link
Member

osa1 commented Oct 24, 2024

I think the only way to fix some of these discrepancies is by implementing Dart properly in both targets. For example:

void main () {
  final List<int> ints = [1, 2, 3];
  final int i = ints[10];
  f(i);
}

f(i) {
  print(i.runtimeType);
  print(i);
}

Here the out-of-bounds access will cause a Wasm trap, which cannot be caught from within the same Wasm execution. So the execution of this code will just halt at that point. In JS you get an undefined, which you can pass around, test for etc.

So the effect of omitting the same check in both platforms are different.

Making dart2wasm work like dart2js here requires adding the bounds check back. But there isn't an undefined equivalent in Wasm, so once we do that we should throw a Dart error, per spec.

Making the dart2js work like dart2wasm is probably not possible, and certainly not desirable.

So the only way to make both work the same way is to add the bounds check back, and once we do that we can just implement the proper Dart semantics. (throw an exception)

@lrhn
Copy link
Member

lrhn commented Oct 24, 2024

I don't think we should make any guarantees about what unsafe optimizations do on different platforms, and especially not that they do the same thing.
We're leaving Dart semantics, and how that looks depends entirely on which underlying semantics we fall back to instead.

You should be developing and testing without unsafe optimizations, so that any bug actually throws. Defaulting to -O4 in development is asking for trouble.
The difference between unsafe optimizations on different platforms should only matter if you are debugging in production. (Which does happens.)

@osa1
Copy link
Member

osa1 commented Oct 24, 2024

You should be developing and testing without unsafe optimizations, so that any bug actually throws. Defaulting to -O4 in development is asking for trouble.

This wouldn't help finding the bug in flutter/devtools#8452. The original code was the following:

static FilterTag? parse(String value) {
  final parts = value.split(filterTagSeparator);
  try {
    final useRegExp = parts.last == useRegExpTag;
    final query = parts[0].trim();
    final settingFilterValues =
        (jsonDecode(parts[1]) as List).cast<Map<String, Object?>>();
    return FilterTag(
      query: query,
      settingFilterValues: settingFilterValues,
      useRegExp: useRegExp,
    );
  } catch (_) {
    // Return null for any parsing error.
    return null;
  }
}

If you run this without unsafe optimizations the catch block catches the errors and this works as expected.

In unsafe mode, the catch block doesn't catch the errors because the errors turn into traps (halts the execution, cannot be caught).

The way to test this is with unsafe optimizations trapping when things go wrong, which is exactly what the release mode does.

@lrhn
Copy link
Member

lrhn commented Oct 24, 2024

The code in question should at least have used on Exception to catch parsing exceptions, maybe even on FormatException, and checked that parts has at least two elements.
We have two style guide entries ([1], [2]) for that, one of them with a lint.

Doing parts[1] + catch(_) instead of checking that there is a parts[1] is discouraged, even if you do it indirectly. If it's not deliberate, which is more likely, this is a good example of why those "use an on type" recommendations exist.
If you do catch (_) {/* ignore error */ }, you hide all errors, including bugs in your own code.
(If anything, the untrappable WASM error was a bonus if it led to this bug being discovered.)

I guess the conclusion would be to enable more lints, and also test with -O4, on all platforms, if you want to know that you find all problems.
And that testing can't find all bugs, especially if the code actively conceals errors.

@sigmundch
Copy link
Member Author

Completely agree here. I'd very much would like the default to not be -O4. At first, we should make it easier to make -O2 available for development and debugging, but with time I'd like the default to change. I think the right approach for this, which @osa1 also mentioned in an internal thread, is to use scoped pragmas as @rakudrama has suggested in the past: then let framework developers apply the pragmas selectively in performance critical logic, but leave application code with the checks.

I like the idea of enabling lints to detect scenarios that hide assumptions made by -O4. As both of you said, one such lint should be to require precise catch clauses. Are there are others to include?

/cc @yjbanov @eyebrowsoffire - any reservations with these ideas for switching the default optimization flags for flutter web over time?

/cc @bkonyi - the flutter tool today exposes a way to switch dart2js builds to use different optimization levels, I was wondering if we could generalize this to apply to wasm too?

@osa1
Copy link
Member

osa1 commented Oct 25, 2024

Regarding testing unsafe code, another idea could be adding a mode to both dart2js and dart2wasm (and any other target that omits runtime checks) to convert the checks we want to avoid in production mode into some kind of crash/failure that is not possible to catch and handle in Dart.

So basically the debug mode, but runtime checks omitted in release mode stop execution when failed, in a way that cannot be accidentally handled.

Stopping the execution can be done with a trap in Wasm, but I don't know if it's possible to do in JS.

We could even make this the only debug mode, which effectively changes runtime error semantics of the language by making them impossible to catch and handle. (because both debug and release modes of the compiler makes them impossible to catch)

@osa1
Copy link
Member

osa1 commented Oct 25, 2024

From #56655 (comment):

Our breaking change policy says that it's not breaking to change behavior of code that throws an Error.
It would also say that changing unspecified behavior is not considered breaking.

So you really shouldn't depend on catching and handling Errors. I think this is in support of the idea that the debug mode perhaps by default should just turn Errors into uncatchable things (e.g. a trap in Wasm).

@rakudrama
Copy link
Member

From #56655 (comment):

Our breaking change policy says that it's not breaking to change behavior of code that throws an Error.
It would also say that changing unspecified behavior is not considered breaking.

So you really shouldn't depend on catching and handling Errors. I think this is in support of the idea that the debug mode perhaps by default should just turn Errors into uncatchable things (e.g. a trap in Wasm).

I am wary of special debugging modes.

Apps need the ability to report back to the server crashes of every kind.
This is how some problems are found and debugged.
Sometimes developers need to resort to interplanetary print-debugging, by pushing fresh builds that contain more logging until someone can understand the problem.

How actionable are wasm traps? Can the complete stack be deobfuscated at the server?

@osa1
Copy link
Member

osa1 commented Oct 28, 2024

I am wary of special debugging modes.

I was thinking that the debug mode should do it, I'm not proposing another debug mode.

Apps need the ability to report back to the server crashes of every kind.
This is how some problems are found and debugged.
Sometimes developers need to resort to interplanetary print-debugging, by pushing fresh builds that contain more logging until someone can understand the problem.

So Errors need to be caught for debugging and logging purposes.

In that case I think what we want is:

  1. Make sure errors are not caught accidentally.

    avoid_catches_without_on_clauses makes catching errors explicit (we can also consider on Object as explicitly wanting to catch errors).

    We can also enable avoid_catching_errors for good measure, and require the user to add // ignore: avoid_catching_errors to make it even more explicit.

    So I agree with @lrhn on this

    I guess the conclusion would be to enable more lints, ...

    But I'm a bit surprised that none of these lints are even in the recommended lints (https://pub.dev/packages/lints/versions/2.1.1), especially considering that Error behavior of a code can be changed without considering that as breaking:

    Our breaking change policy says that it's not breaking to change behavior of code that throws an Error.
    It would also say that changing unspecified behavior is not considered breaking.

    So we really don't want the users to rely on catching Errors, but we don't have the lint even in the recommended lints.

    I work on the compiler and I didn't know that the user code shouldn't rely on Errors, and DevTools code also don't this. I think most users won't be aware that they shouldn't rely on Errors, and the Dart tooling won't help them by default.

    Should we add one or both of these lints to the recommended lints?

  2. Omit errors we omit in release mode in debug mode as well.

    This is so that we test using the language semantics used by the compiler in release mode.

    Right now it's as if we are testing with one version of the language and deploying on another. We should test and release using the same version of the language.

Any thoughts on these @lrhn @rakudrama?


How actionable are wasm traps? Can the complete stack be deobfuscated at the server?

Wasm traps can be caught by the JS code calling a Wasm function (that could be the Wasm function for Dart main, or some callback of a Promise etc.), and the JS code gets full stack trace including Wasm functions. The stack frames will have names if the Wasm module has a name section. We also generate source maps now with function names and locations of the instructions, so that can also be used.

You just can't catch Wasm exception from within the same Wasm execution. The only way to catch a Wasm trap from Wasm code is by catching it in JS, and then passing/returning the caught trap to Wasm. So the call stack would look like: [Wasm (handles trap), JS (catches trap, returns to caller), Wasm (traps)].

Since in the browser you can't start running Wasm without JS, in practice we can catch all Wasm crashes and log/print.

@osa1
Copy link
Member

osa1 commented Oct 28, 2024

I checked some of the devtools packages for these two lints.

  • devtools_app:

    • 71 violations of `avoid_catches_without_on_clauses
    • 4 violations of avoid_catching_errors:
      • 1 AssertionError
      • 2 StateError
      • 1 Error
  • devtools_app_shared:

    • 16 violations of avoid_catches_without_on_clauses
    • 1 violations of avoid_catching_errors
      • 1 UnimplementedError
  • devtools_shared:

    • 7 violations of avoid_catches_without_on_clauses
    • 2 violations of avoid_catching_errors
      • 1 Error
      • 1 StateError

Fixing these may require significant amount of refactoring, as unlike the bug referenced above, these code don't seem to catch an error directly thrown by one of the lines that can be fixed by checking for errors before the call.

@bkonyi
Copy link
Contributor

bkonyi commented Oct 28, 2024

/cc @bkonyi - the flutter tool today exposes a way to switch dart2js builds to use different optimization levels, I was wondering if we could generalize this to apply to wasm too?

Is this not already exposed via the --optimization-level flag on flutter build web?

@mkustermann
Copy link
Member

mkustermann commented Oct 30, 2024

The unsafe -O3 and -O4 modes are by definition unsafe. Apps will in those unsafe modes run correctly under assumptions (e.g. assuming no IndexErrors, type errors ... are thrown). If the assumptions aren't met, then we have undefined behavior.

The undefined behavior may manifest in
a) no exceptions, printing nulls, storing nulls into fields that are non-nullable, take wrong control flow, even make wrong REST api calls, ... anything could happen
b) throw an exception (e.g. a null exception) - possibly in a different place as where the error actually occurred
c) trap - possibly in a different place as where the error actually occured

I don't think it makes sense to even attempt to align dart2js and dart2wasm on which of those kinds of undefined behavior is triggered, i.e. align them on specific scenarios on a) or b) or c).

It's understandable that a user may want to be able to catch all exceptions. Though if an exception was triggered by undefined behavior, the app is out-of-control, it may have done many wrong things before it threw the exception. It may have triggered things in the event loop later that throw, may have incorrectly modified program state, ... So even if we didn't trap in dart2wasm but threw an exception, there's just no guarantees for a programmer that the app is running correctly afterwards either

=> It's a false sense of "safety" to think that any undefined behavior inside the body of a try try {} catch {} will not cause more undefined behavior after the try {} catch {}.

Overall my hope is that we can make things fast enough with dart2wasm that we don't need unsafe modes anymore or only use it in very limited, scoped circumstances. If we can make the cost within 10% of perf/size, maybe that's acceptable.

IMHO flutter should've definitely made the default to be sound mode and let users explicitly opt out of this. But since dart2js was using -O4 already as default for long time, it made sense to use this for dart2wasm - to make numbers comparable.

@lrhn
Copy link
Member

lrhn commented Oct 30, 2024

It's understandable that a user may want to be able to catch all exceptions.

And just to be clear, here that means catching all errors.

You should always catch all Exceptions, and your correct program should not throw any Errors.
Catching Errors from separate self-contained subcomponents may be safe, but if the erroneous code can alter global state, there is no guarantee that the program is in a consistent state after an Error has been thrown.

Anything can happen after an Error is thrown, only limited by the language semantics.

With unsafe optimizations enabled, that becomes "Anything can happen after an Error should be thrown", whether it is thrown or not, and not limited by language semantics. Not much guarantee at all.

@osa1
Copy link
Member

osa1 commented Oct 30, 2024

Since fixing these issues in user code is not easily possible, and we still want to omit the checks we are omitting with -O4, I think we should convert errors that we do not throw in optimized modes (out-of-bounds errors, some of the type errors etc.) into uncatchable errors (like traps in Wasm) in -O0.

With this I can test that I'm not relying on errors that will disappear when I deploy my app.

This by itself won't fix the discrepancies, because dart2wasm and dart2js can omit different types of errors, or in different standard library functions.

One way to fix the discrepancies is to do this for all errors. With this users can test that they are not relying on errors, and the backends can omit the ones that makes sense to omit, for performance.

I think it we just turn out-of-bounds and type errors to uncatchable crashes that should solve the big part of the issue.

@lrhn
Copy link
Member

lrhn commented Oct 30, 2024

Converting errors to being uncatchable seems like something that should have an opt-in flag, and not something that should be default behavior. Because it's not spec-compliant behavior.

Then it makes good sense as "yet another non-standard mode".

It would worry me if a compiler has no standard compliant mode. That makes it incredibly hard to test that it actually does what it should. It'll be basically impossible to run language tests. There should be an implementation of the specified behavior that options can then choose to diverge from.

@sigmundch
Copy link
Member Author

Agree about being opt-in only. Unfortunately, I don't believe we can replicate it in JS. We could bypass on-clauses and make errors more visible, but it won't match wasm traps behavior.

To bring this back to the original issue. Our goal is to allow for a seamless transition between JS and Wasm backends.

@yjbanov @eyebrowsoffire @kevmoo @srujzs @natebiggs and I brainstormed about this too. Here is a summary of our discussions.

We identified 3 sources of inconsistencies between JS and Wasm today that make this transition difficult:

  • (a) APIs availability (e.g. dart:html is not allowed in wasm)
  • (b) Inconsistent behavior by design (e.g. int64, cast to int/double, js-interop inconsistencies)
  • (c) Inconsistent behavior due to unsafe modes (O3 or O4)

We need to keep apps where they will behave consistently. This requires different approaches for each of the 3 areas above. Ignoring (a) here. We've talked about 3 approaches for (b) and (c):

  • (i) runtime tricks: this includes ideas like @osa1 traps for Error (for c), or other ideas like logging inconsistent casts (for b), or make wasm have implicit coersions to match JS backends (alternative for b too)
  • (ii) static checks: this includes ideas like enabling new lints to detect issues at compile time. For example, avoid_catching_errors (for c, as @osa1 suggested earlier), invalid_runtime_check_with_js_interop_types (for b), and design new ones that are not covered today (turn as int into (as num).toInt)
  • (iii) Scope down the unsafe modes: as @mkustermann and I mentioned earlier, O3+ should not be the default behavior going forward. There may be engine/framework code that needs to elide some checks for performance. For those, we would push for a fine-grain approach (e.g. scoped pragmas). The hope is that most application code is spec compliant.

I propose we prioritize (iii) first, then (ii), then (i). With (iii) we will finally get apps closer to the spec and hopefully eliminate (c). I expect (ii) will help assure developers that they will not hit (b) because it can be analyzed statically. There may still be value in some solutions like (i), but the usefulness is limited since it is highly dependent on test coverage.

@osa1
Copy link
Member

osa1 commented Oct 31, 2024

I propose we prioritize (iii) first, then (ii), then (i). With (iii) we will finally get apps closer to the spec and hopefully eliminate (c). I expect (ii) will help assure developers that they will not hit (b) because it can be analyzed statically. There may still be value in some solutions like (i), but the usefulness is limited since it is highly dependent on test coverage.

SGTM, but one concern here may be that the scoped pragmas won't affect indirectly called code. If I index a List directly I can add a pragma and avoid the bounds check, but if I'm calling a third-party library that indexes a list I can't add a pragma to the call site and make the library do unchecked indexing.

Converting errors to being uncatchable seems like something that should have an opt-in flag, and not something that should be default behavior. Because it's not spec-compliant behavior.

My point is I should be able to test and debug my app in the same compilation mode/semantics as the release builds. You can do this today, but you have to figure out the right set of flags yourself and somehow pass them to the compiler (e.g. via the flutter CLI). What I suggest is just making the semantics of release and debug builds match.

Then we can discuss whether the release mode is spec compliant, and maybe fix it.

Right now control flow of my program can be different in release and debug builds, I'm merely suggesting that we should fix this.

@sigmundch
Copy link
Member Author

one concern here may be that the scoped pragmas won't affect indirectly called code

Agree. I hope it is rare enough, that we can live with the performance cost. If it becomes critical, I'd lean towards other fine-grain alternatives (like adding a way to override the pragma from a configuration file passed to the compiler). It's not ergonomic, but probably OK for the rare cases.

My point is I should be able to test and debug my app in the same compilation mode/semantics as the release builds

I partially agree :)

I agree that you should get consistent semantics between debug and release. The flutter-cli should guarantee that, but not as an orthogonal aspect of the specified semantics. There is a risk that by building consistency of unspecified behavior, then it may become a de facto expected behavior.

One of the reasons I think it should be coupled with the spec is that it's likely that the debug and release modes are delivered by different implementations.

That's already true for JS, flutter run uses DDC, flutter build uses dart2js. Only dart2js supports -O4, so in debug mode behaves more like -O2 (via DDC).

It's likely DDC will be used during development when deploying with dart2wasm, too (and yes, there are other gaps, like in64, that need to be addressed to make that work well).

That said, we had many scenarios where developers needed to debug code in -O4, rather than using the release build, we historically overloaded the use of flutter's --profile mode. In our case, the --profile flag compiles with dart2js using mostly the same options as --release, but prevents name minification that make the code harder to debug. I wonder if what we are looking here is something similar for wasm?

I believe that if we address (iii), the flutter-cli will then finally guarantee the consistency we seek. The risk of unspecified behavior (and thus inconsistencies from it) goes down dramatically when only code that is intentionally designed to elide checks is opted-in via pragmas.

@lrhn
Copy link
Member

lrhn commented Nov 1, 2024

What I suggest is just making the semantics of release and debug builds match.

Only if that is the spec compliant semantics 😁

The purpose of running tests is to find bugs before they reach production.
Debug builds (development builds) are for finding bugs. Turning off checks in that mode is counter to it's purpose.

The production compiler's unsafe optimizations may hide bugs, so running tests with those optimizations of counterproductive.
It would risk people writing code that only works with checks removed, and never noticing.
Or happily embrace that they have a version of Dart with a consistent different semantics than real Dart.

I think the risk of code hiding a real bug in a way that becomes visible only in production mode, because it fails differently, of a much smaller risk, and a more manageable one, than allowing invalid behavior everywhere.

Bugs should fail early. If they don't fail until reaching production, that's annoying, but a bug not falling at all is a much worse problem.
It's still a bug, it may just be a data bug instead of a control flow bug then.
(You would rather have your program crash in production than having it charge the wrong price for a product!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-dart2wasm Issues for the dart2wasm compiler. area-web Use area-web for Dart web related issues, including the DDC and dart2js compilers and JS interop. web-dart2js
Projects
None yet
Development

No branches or pull requests

6 participants