Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monomorphize dropped functions #6734

Merged
merged 36 commits into from
Jul 12, 2024
Merged

Monomorphize dropped functions #6734

merged 36 commits into from
Jul 12, 2024

Conversation

kripken
Copy link
Member

@kripken kripken commented Jul 11, 2024

We now consider a drop to be part of the call context: If we see

(drop
  (call $foo)
)

(func $foo (result i32)
  (i32.const 42)
)

Then we'd monomorphize to this:

(call $foo_1)  ;; call the specialized function instead

(func $foo_1   ;; the specialized function returns nothing
  (drop        ;; the drop was moved into here
    (i32.const 42)
  )
)

With the drop now in the called function, we may be able to optimize out unused work.

Refactor a bit of code out of DAE that we can reuse here, into a new return-utils.h.

@kripken kripken requested a review from tlively July 11, 2024 22:55
@kripken
Copy link
Member Author

kripken commented Jul 11, 2024

Oops, fuzzer quickly found that that old code I was so happy to reuse does not support return calls yet. DAE must handle those some other way. Should be an easy fix though.

curr->finalize();
refinalize = true;

replaceCurrent(Builder(*getModule()).makeDrop(curr));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be followed by a return to avoid going to on execute subsequent code that should have been dead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks, the semantics include the return 😄 ... Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, and now that I think about it, this also needs to do the thing where it breaks out of the function body before doing the call to make sure exceptions propagate correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. And actually, thinking more on this, we should not be changing return calls here. This is not like inlining where we remove a call, so removing the return part of a return call is ok - here we'd just be removing the return part in a copy of the function. So we could break programs by blowing up the stack.

To fix that, I made the pass not monomorphize drops when they'd cause us to need to remove a return call's return.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we shouldn't optimize return calls in monomorphized bodies. If you know the caller drops the result, you can change a return call in the callee to be a branch + call + drop sequence, which can allow further optimizations like monomorphizing the former return-callee now that we see that its result is also dropped.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can definitely help in some cases, yeah, but it can break others. Imagine that A calls B which calls A etc. in some very deep stack before it stops. If we monomorphize them both, then even if we save 90% of the work inside them, if the calls are no longer return calls then we'd be using potentially enough stack to error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. FWIW, inlining also has this problem because the frame sizes increase with the depth of inlining, even through tail calls. I don't think that's enough reason to avoid optimizing, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry less about frame sizes increasing as that's generally going to be a "moderate" factor, like maybe the frame doubles in size. Maybe it even 10x grows somehow? But if we turn tail calls into normal calls for a language that does loops using tail calls then we can turn what were 1,000,000 loop iterations into 1,000,000 nested calls, can't we? 😱

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess if we wanted to avoid that problem, we would want to make the monomorphization of a function containing a return call contingent on the monomorphization of the return-callee as well. So the entire return-call graph reachable from the original monomorphized function would be monomorphized as a unit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's right.

Another option might be to only monomorphize in cases that we shrink the called function so much it will get inlined immediately, which eliminates a call.

Comment on lines 258 to 261
;; This is monomorphized in ALWAYS (as the drop means the call context is
;; not trivial), but not in CAREFUL mode (as there is no significant benefit
;; from optimizations on the monomorphized function - the import is opaque
;; work we cannot do anything with, even if we see it is handed consts).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is precomputing the xor and add not enough of a benefit?

Copy link
Member Author

@kripken kripken Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our cost analysis may not be precise enough, but we have this:

(call $import
 (i32.xor
  (local.get $0)
  (local.get $1)
 )
 (i32.add
  (local.get $0)
  (local.get $1)
 )
)

vs

(drop
 (call $import
  (i32.const 7)
  (i32.const 7)
 )
)

The call is the same. local.get has 0 cost (we assume it might already be in a register), so that vanishes. drop also has cost 0. And we give consts and super-simple math operations like i32.add a cost of 1, so they end up the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting, consts and simple math having the same cost means that we don't prefer replacing math with consts. Could the code size difference tip the scales if it were large enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need to also measure code size for that, which it looks like we're not doing. In this case, I think that is the right outcome actually: We are duplicating code, so even if we shrink the function we are still growing the total size. So this only makes sense if we actually remove work.

Though in general I think you're right, it's not obvious that considering consts and simple math as the same is right. I think we did simple measurements a decade ago and that seemed ok, but I'm not sure how accurate those were then, much less now... worth revisiting.

@kripken
Copy link
Member Author

kripken commented Jul 12, 2024

The compile error here happens on clang but not gcc... I'm not sure yet which of them is right.

@kripken
Copy link
Member Author

kripken commented Jul 12, 2024

Actually neither... the issue was using std::unordered_map in a way that differed between C++ STL impls (yikes). I worked around it.

Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, since we can leave optimization of return calls to a follow-up if we decide to move forward with that.

One thing that might read test readability, though, would be to use separate modules for the separate cases to avoid the monomorphized functions all gathering separately at the bottom of the file.

@kripken
Copy link
Member Author

kripken commented Jul 12, 2024

Good point, thanks. I split them up more now. The cost is repeating a few imports and such but reading it now it definitely looks clearer, with less distance from the function to its monomorphized version.

@kripken kripken merged commit d2a48af into WebAssembly:main Jul 12, 2024
13 checks passed
@kripken kripken deleted the monodrop branch July 12, 2024 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants