Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flang][OpenMP] Handle usage of array elements in loop-control expressions #128

Merged

Conversation

ergawy
Copy link

@ergawy ergawy commented Aug 1, 2024

Extends the fix-up logic for trip_count calculation in target regions. Previously, if an array element was used to compute any of the loop bounds, the trip-count calculation ops would extract arra elements from the mapped declaration of the array inside the target region. This commit hanles that situation.

Hopefully, fixes https://ontrack-internal.amd.com/browse/SWDEV-476122

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dominik suggested to convert this to a smoke test instead. I will do that in a separate PR.

Copy link

@asitav asitav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR fixes the build error observed with flang-new with array elements in loop bounds in a target region.

@ergawy ergawy force-pushed the handle_array_elems_in_loop_control branch from 9e00764 to b50e1d1 Compare August 3, 2024 11:30
…sions

Extends the fix-up logic for `trip_count` calculation in `target`
regions. Previously, if an array element was used to compute any of the
loop bounds, the trip-count calculation ops would extract arra elements
from the mapped declaration of the array inside the target region. This
commit hanles that situation.
@ergawy ergawy force-pushed the handle_array_elems_in_loop_control branch from b50e1d1 to ad1cf4c Compare August 7, 2024 12:42
@ergawy ergawy merged commit b7fbf3a into ROCm:amd-trunk-dev Aug 7, 2024
2 of 4 checks passed
searlmc1 pushed a commit that referenced this pull request Sep 14, 2024
This patch does 3 things:
1. Add support for optimizing the address mode of HVX load/store
instructions
2. Reduce the value of Add instruction immediates by replacing with the
difference from other Addi instructions that share common base:

For Example, If we have the below sequence of instructions: r1 =
add(r2,# 1024) ... r3 = add(r2,# 1152) ... r4 = add(r2,# 1280)

Where the register r2 has the same reaching definition, They get
modified to the below sequence:

       r1 = add(r2,# 1024)
            ...
       r3 = add(r1,# 128)
            ...
       r4 = add(r1,# 256)
3. Fixes a bug pass where the addi instructions were modified based on a
predicated register definition, leading to incorrect output.

Eg:
         INST-1: if (p0) r2 = add(r13,# 128)
         INST-2: r1 = add(r2,# 1024)
         INST-3: r3 = add(r2,# 1152)
         INST-4: r5 = add(r2,# 1280)

In the above case, since r2's definition is predicated, we do not want
to modify the uses of r2 in INST-3/INST-4 with add(r1,#128/256)

4.Fixes a corner case

It looks like we never check whether the offset register is actually
live (not clobbered) at optimization site. Add the check whether it is
live at MBB entrance. The rest should have already been verified.

5. Fixes a bad codegen

For whatever reason we do transformation without checking if the value
in register actually reaches the user. This is second identical fix for
this pass.

   Co-authored-by: Anirudh Sundar <quic_sanirudh@quicinc.com>
   Co-authored-by: Sergei Larin <slarin@quicinc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants