-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding resize(PadOp) vectorization analysis #3321
base: main
Are you sure you want to change the base?
Conversation
This reverts commit d0addc4.
Added support for lowering TernaryOp:where with vectorization factor. i.e. ``` predicate ? loadGlobalToLocal<...>(&dst[0], &src[i_src]) : dst.set(0.0f) ``` Currently this can only be done via manual scheduling. The follow up PR on vectorization analysis will make this automatically applied in PR #3321
!test |
Co-authored-by: Naoya Maruyama <naoyam@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It overall looks good. Just would like a few things I commented about to get addressed.
!test --pybench |
Initiated testing with python benchmarks just in case. |
Thanks, I'll address the issues you brought up as well as running through some real size problem so we get a taste of the perf impact. 🙇 |
!test --pybench |
!test --pybench |
Adding conditional support of reszie in vectorization analysis. This PR allows vectorized load on
PadOp
directly without using cache load. This PR improves performance of generated kernel.What's in this PR:
Add propagation rule for resize in vectorization analysis. The propagation rule works as:
i. For supported resize: a). project the resize op to the frontier and clear
(frontier.begin(), resize_position)
; b). add projected extent of the new resize op asgcd(id_from, resize_op->leftExpand(), resize_op->rightExpand)
ii. For unsupported resize: clear
[frontier.begin(), resize_position]
; no behavior change.updating TensorView::cacheAfter to opt-in a set of uses to cache while leaving other uses unchanged. Necessary for cases where inputs are used by PadOp as well as other operation that relies on cached load for vectorization.
Follow up to #3261.
Work for supporting rope performance. design doc: