-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monomorphize dropped functions #6734
Conversation
Oops, fuzzer quickly found that that old code I was so happy to reuse does not support return calls yet. DAE must handle those some other way. Should be an easy fix though. |
src/ir/return-utils.h
Outdated
curr->finalize(); | ||
refinalize = true; | ||
|
||
replaceCurrent(Builder(*getModule()).makeDrop(curr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be followed by a return
to avoid going to on execute subsequent code that should have been dead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, thanks, the semantics include the return 😄 ... Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, and now that I think about it, this also needs to do the thing where it breaks out of the function body before doing the call to make sure exceptions propagate correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. And actually, thinking more on this, we should not be changing return calls here. This is not like inlining where we remove a call, so removing the return part of a return call is ok - here we'd just be removing the return part in a copy of the function. So we could break programs by blowing up the stack.
To fix that, I made the pass not monomorphize drops when they'd cause us to need to remove a return call's return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we shouldn't optimize return calls in monomorphized bodies. If you know the caller drops the result, you can change a return call in the callee to be a branch + call + drop sequence, which can allow further optimizations like monomorphizing the former return-callee now that we see that its result is also dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can definitely help in some cases, yeah, but it can break others. Imagine that A calls B which calls A etc. in some very deep stack before it stops. If we monomorphize them both, then even if we save 90% of the work inside them, if the calls are no longer return calls then we'd be using potentially enough stack to error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. FWIW, inlining also has this problem because the frame sizes increase with the depth of inlining, even through tail calls. I don't think that's enough reason to avoid optimizing, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry less about frame sizes increasing as that's generally going to be a "moderate" factor, like maybe the frame doubles in size. Maybe it even 10x grows somehow? But if we turn tail calls into normal calls for a language that does loops using tail calls then we can turn what were 1,000,000 loop iterations into 1,000,000 nested calls, can't we? 😱
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess if we wanted to avoid that problem, we would want to make the monomorphization of a function containing a return call contingent on the monomorphization of the return-callee as well. So the entire return-call graph reachable from the original monomorphized function would be monomorphized as a unit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that's right.
Another option might be to only monomorphize in cases that we shrink the called function so much it will get inlined immediately, which eliminates a call.
;; This is monomorphized in ALWAYS (as the drop means the call context is | ||
;; not trivial), but not in CAREFUL mode (as there is no significant benefit | ||
;; from optimizations on the monomorphized function - the import is opaque | ||
;; work we cannot do anything with, even if we see it is handed consts). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is precomputing the xor and add not enough of a benefit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our cost analysis may not be precise enough, but we have this:
(call $import
(i32.xor
(local.get $0)
(local.get $1)
)
(i32.add
(local.get $0)
(local.get $1)
)
)
vs
(drop
(call $import
(i32.const 7)
(i32.const 7)
)
)
The call is the same. local.get
has 0 cost (we assume it might already be in a register), so that vanishes. drop
also has cost 0. And we give consts and super-simple math operations like i32.add
a cost of 1, so they end up the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting, consts and simple math having the same cost means that we don't prefer replacing math with consts. Could the code size difference tip the scales if it were large enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd need to also measure code size for that, which it looks like we're not doing. In this case, I think that is the right outcome actually: We are duplicating code, so even if we shrink the function we are still growing the total size. So this only makes sense if we actually remove work.
Though in general I think you're right, it's not obvious that considering consts and simple math as the same is right. I think we did simple measurements a decade ago and that seemed ok, but I'm not sure how accurate those were then, much less now... worth revisiting.
The compile error here happens on clang but not gcc... I'm not sure yet which of them is right. |
Actually neither... the issue was using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, since we can leave optimization of return calls to a follow-up if we decide to move forward with that.
One thing that might read test readability, though, would be to use separate modules for the separate cases to avoid the monomorphized functions all gathering separately at the bottom of the file.
Good point, thanks. I split them up more now. The cost is repeating a few imports and such but reading it now it definitely looks clearer, with less distance from the function to its monomorphized version. |
We now consider a drop to be part of the call context: If we see
Then we'd monomorphize to this:
With the drop now in the called function, we may be able to optimize out unused work.
Refactor a bit of code out of DAE that we can reuse here, into a new
return-utils.h
.