Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[flang][OpenMP] Add initial pass to map
do concurrent
to OMP #48[flang][OpenMP] Add initial pass to map
do concurrent
to OMP #48Changes from 1 commit
e4d6564
385768a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds to me like "we want to add the lower bound, upper bound and step variables to a firstprivate clause for the parallel region". So, if I understand it correctly, maybe we don't want to do much here, but rather rely on delayed privatization when that's implemented for
omp.parallel
. In the meantime, your implementation below at least addresses the constant bounds case. I'd suggest emitting an error if any of these variables is not a constant or type cast (otherwise it might silently produce incorrect code or crash) and also check that they indeed have a defining op before dereferencing it, or using function arguments would make this crash altogether.If we want the first implementation to support something other than just constants and we can't wait for delayed privatization, maybe the approach with the least friction is to try to replicate here the "firstprivate" clause handling for parallel regions currently implemented in PFT to MLIR lowering (between OpenMP.cpp and the
DataSharingProcessor
).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some checks to make sure defining ops meet the current restrictions of the pass.
Both delayed privatization and do concurrent mapping are in-flight at the moment. So indeed once delayed privatization is more complete and the switch in flang is flipped we should move this to use delayed privatization. I think that should happen soon 🤞. So once I get to beef up this pass a bit more, we indeed either use delayed privatization directly or replicate some of the logic in OMP lowering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be replaced by a simpler check for live-ins into the loop region? That is, all values that are defined outside but used inside of it. Then, we could have some logic to decide whether each of these values should be private, firstprivate or shared and produce the corresponding
omp.private
ops and block args.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense. But do you mind leaving that out for now and gradually work up the support for more complex loops as we go? For now, my immediate focus is to map a simple loop like the one in the test to the device instead of the host. So I am focusing on simple loops for both the device and host and will move up to more complex stuff after that.