-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SLP]Add cost estimation for gather node reshuffling #115201
base: main
Are you sure you want to change the base?
[SLP]Add cost estimation for gather node reshuffling #115201
Conversation
Created using spr 1.3.5
@llvm/pr-subscribers-backend-x86 @llvm/pr-subscribers-llvm-transforms Author: Alexey Bataev (alexey-bataev) ChangesAdds cost estimation for the variants of the permutations of the scalar X86, AVX512, -O3+LTO Program size..text CINT2006/445.gobmk - extra code vectorized RISC-V, sifive-p670, -O3+LTO Patch is 79.58 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/115201.diff 9 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 53e514766fee81..48419699f9cd53 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -4846,8 +4846,21 @@ getShuffleCost(const TargetTransformInfo &TTI, TTI::ShuffleKind Kind,
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
int Index = 0, VectorType *SubTp = nullptr,
ArrayRef<const Value *> Args = {}) {
- if (Kind != TTI::SK_PermuteTwoSrc)
+ if (Kind != TTI::SK_PermuteTwoSrc) {
+ int SplatIdx = PoisonMaskElem;
+ if (!Mask.empty() && all_of(Mask, [&](int Idx) {
+ if (Idx == PoisonMaskElem)
+ return true;
+ if (SplatIdx == PoisonMaskElem) {
+ SplatIdx = Idx;
+ return true;
+ }
+ return SplatIdx == Idx;
+ }))
+ return TTI.getShuffleCost(TTI::SK_Broadcast, Tp, Mask, CostKind, Index,
+ SubTp, Args);
return TTI.getShuffleCost(Kind, Tp, Mask, CostKind, Index, SubTp, Args);
+ }
int NumSrcElts = Tp->getElementCount().getKnownMinValue();
int NumSubElts;
if (Mask.size() > 2 && ShuffleVectorInst::isInsertSubvectorMask(
@@ -10257,10 +10270,10 @@ class BoUpSLP::ShuffleCostEstimator : public BaseShuffleAnalysis {
Idx = EMask[Idx];
}
CommonVF = E->Scalars.size();
- } else if (std::optional<unsigned> Factor = E->getInterleaveFactor();
- Factor && E->Scalars.size() != Mask.size() &&
+ } else if (unsigned Factor = E->getInterleaveFactor();
+ Factor > 0 && E->Scalars.size() != Mask.size() &&
ShuffleVectorInst::isDeInterleaveMaskOfFactor(CommonMask,
- *Factor)) {
+ Factor)) {
// Deinterleaved nodes are free.
std::iota(CommonMask.begin(), CommonMask.end(), 0);
}
@@ -12935,6 +12948,7 @@ BoUpSLP::isGatherShuffledSingleRegisterEntry(
// No perfect match, just shuffle, so choose the first tree node from the
// tree.
Entries.push_back(FirstEntries.front());
+ VF = FirstEntries.front()->getVectorFactor();
} else {
// Try to find nodes with the same vector factor.
assert(UsedTEs.size() == 2 && "Expected at max 2 permuted entries.");
@@ -12975,6 +12989,8 @@ BoUpSLP::isGatherShuffledSingleRegisterEntry(
Entries.push_back(SecondEntries.front());
VF = std::max(Entries.front()->getVectorFactor(),
Entries.back()->getVectorFactor());
+ } else {
+ VF = Entries.front()->getVectorFactor();
}
}
@@ -13077,26 +13093,149 @@ BoUpSLP::isGatherShuffledSingleRegisterEntry(
// Pair.first is the offset to the vector, while Pair.second is the index of
// scalar in the list.
for (const std::pair<unsigned, int> &Pair : EntryLanes) {
- unsigned Idx = Part * VL.size() + Pair.second;
+ int Idx = Part * VL.size() + Pair.second;
Mask[Idx] =
Pair.first * VF +
(ForOrder ? std::distance(
Entries[Pair.first]->Scalars.begin(),
find(Entries[Pair.first]->Scalars, VL[Pair.second]))
: Entries[Pair.first]->findLaneForValue(VL[Pair.second]));
- IsIdentity &= Mask[Idx] == Pair.second;
+ IsIdentity &= Mask[Idx] % VL.size() == Idx % VL.size();
}
- switch (Entries.size()) {
- case 1:
- if (IsIdentity || EntryLanes.size() > 1 || VL.size() <= 2)
- return TargetTransformInfo::SK_PermuteSingleSrc;
- break;
- case 2:
- if (EntryLanes.size() > 2 || VL.size() <= 2)
- return TargetTransformInfo::SK_PermuteTwoSrc;
- break;
- default:
- break;
+ if (ForOrder || IsIdentity || Entries.empty()) {
+ switch (Entries.size()) {
+ case 1:
+ if (IsIdentity || EntryLanes.size() > 1 || VL.size() <= 2)
+ return TargetTransformInfo::SK_PermuteSingleSrc;
+ break;
+ case 2:
+ if (EntryLanes.size() > 2 || VL.size() <= 2)
+ return TargetTransformInfo::SK_PermuteTwoSrc;
+ break;
+ default:
+ break;
+ }
+ } else if (!isa<VectorType>(VL.front()->getType()) &&
+ (EntryLanes.size() > Entries.size() || VL.size() <= 2)) {
+ // Do the cost estimation if shuffle beneficial than buildvector.
+ SmallVector<int> SubMask(std::next(Mask.begin(), Part * VL.size()),
+ std::next(Mask.begin(), (Part + 1) * VL.size()));
+ int MinElement = SubMask.front(), MaxElement = SubMask.front();
+ for (int Idx : SubMask) {
+ if (Idx == PoisonMaskElem)
+ continue;
+ if (MinElement == PoisonMaskElem || MinElement % VF > Idx % VF)
+ MinElement = Idx;
+ if (MaxElement == PoisonMaskElem || MaxElement % VF < Idx % VF)
+ MaxElement = Idx;
+ }
+ assert(MaxElement >= 0 && MinElement >= 0 &&
+ "Expected at least single element.");
+ unsigned NewVF = std::max<unsigned>(
+ VL.size(), getFullVectorNumberOfElements(*TTI, VL.front()->getType(),
+ (MaxElement % VF) -
+ (MinElement % VF) + 1));
+ if (NewVF < VF) {
+ for_each(SubMask, [&](int &Idx) {
+ if (Idx == PoisonMaskElem)
+ return;
+ Idx = (Idx % VF) - (MinElement % VF) +
+ (Idx >= static_cast<int>(VF) ? NewVF : 0);
+ });
+ VF = NewVF;
+ }
+
+ constexpr TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
+ auto *VecTy = getWidenedType(VL.front()->getType(), VF);
+ auto *MaskVecTy = getWidenedType(VL.front()->getType(), SubMask.size());
+ auto GetShuffleCost = [&,
+ &TTI = *TTI](ArrayRef<int> Mask,
+ ArrayRef<const TreeEntry *> Entries,
+ VectorType *VecTy) -> InstructionCost {
+ if (Entries.size() == 1 && Entries.front()->getInterleaveFactor() > 0 &&
+ ShuffleVectorInst::isDeInterleaveMaskOfFactor(
+ Mask, Entries.front()->getInterleaveFactor()))
+ return TTI::TCC_Free;
+ return ::getShuffleCost(TTI,
+ Entries.size() > 1 ? TTI::SK_PermuteTwoSrc
+ : TTI::SK_PermuteSingleSrc,
+ VecTy, Mask, CostKind);
+ };
+ InstructionCost ShuffleCost = GetShuffleCost(SubMask, Entries, VecTy);
+ InstructionCost FirstShuffleCost = 0;
+ SmallVector<int> FirstMask(SubMask.begin(), SubMask.end());
+ if (Entries.size() == 1 || !Entries[0]->isGather()) {
+ FirstShuffleCost = ShuffleCost;
+ } else {
+ // Transform mask to include only first entry.
+ APInt DemandedElts = APInt::getAllOnes(SubMask.size());
+ bool IsIdentity = true;
+ for (auto [I, Idx] : enumerate(FirstMask)) {
+ if (Idx >= static_cast<int>(VF)) {
+ Idx = PoisonMaskElem;
+ } else {
+ DemandedElts.clearBit(I);
+ if (Idx != PoisonMaskElem)
+ IsIdentity &= static_cast<int>(I) == Idx;
+ }
+ }
+ if (!IsIdentity)
+ FirstShuffleCost = GetShuffleCost(FirstMask, Entries.front(), VecTy);
+ FirstShuffleCost += TTI->getScalarizationOverhead(
+ MaskVecTy, DemandedElts, /*Insert=*/true,
+ /*Extract=*/false, CostKind);
+ }
+ InstructionCost SecondShuffleCost = 0;
+ SmallVector<int> SecondMask(SubMask.begin(), SubMask.end());
+ if (Entries.size() == 1 || !Entries[1]->isGather()) {
+ SecondShuffleCost = ShuffleCost;
+ } else {
+ // Transform mask to include only first entry.
+ APInt DemandedElts = APInt::getAllOnes(SubMask.size());
+ bool IsIdentity = true;
+ for (auto [I, Idx] : enumerate(SecondMask)) {
+ if (Idx < static_cast<int>(VF) && Idx >= 0) {
+ Idx = PoisonMaskElem;
+ } else {
+ DemandedElts.clearBit(I);
+ if (Idx != PoisonMaskElem) {
+ Idx -= VF;
+ IsIdentity &= static_cast<int>(I) == Idx;
+ }
+ }
+ }
+ if (!IsIdentity)
+ SecondShuffleCost = GetShuffleCost(SecondMask, Entries[1], VecTy);
+ SecondShuffleCost += TTI->getScalarizationOverhead(
+ MaskVecTy, DemandedElts, /*Insert=*/true,
+ /*Extract=*/false, CostKind);
+ }
+ APInt DemandedElts = APInt::getAllOnes(SubMask.size());
+ for (auto [I, Idx] : enumerate(SubMask))
+ if (Idx == PoisonMaskElem)
+ DemandedElts.clearBit(I);
+ InstructionCost BuildVectorCost =
+ TTI->getScalarizationOverhead(MaskVecTy, DemandedElts, /*Insert=*/true,
+ /*Extract=*/false, CostKind);
+ const TreeEntry *BestEntry = nullptr;
+ if (FirstShuffleCost < ShuffleCost) {
+ copy(FirstMask, std::next(Mask.begin(), Part * VL.size()));
+ BestEntry = Entries.front();
+ ShuffleCost = FirstShuffleCost;
+ }
+ if (SecondShuffleCost < ShuffleCost) {
+ copy(SecondMask, std::next(Mask.begin(), Part * VL.size()));
+ BestEntry = Entries[1];
+ ShuffleCost = SecondShuffleCost;
+ }
+ if (BuildVectorCost >= ShuffleCost) {
+ if (BestEntry) {
+ Entries.clear();
+ Entries.push_back(BestEntry);
+ }
+ return Entries.size() > 1 ? TargetTransformInfo::SK_PermuteTwoSrc
+ : TargetTransformInfo::SK_PermuteSingleSrc;
+ }
}
Entries.clear();
// Clear the corresponding mask elements.
diff --git a/llvm/test/Transforms/SLPVectorizer/AArch64/reused-scalar-repeated-in-node.ll b/llvm/test/Transforms/SLPVectorizer/AArch64/reused-scalar-repeated-in-node.ll
index d6073ea4bbbae6..96ff73f117a739 100644
--- a/llvm/test/Transforms/SLPVectorizer/AArch64/reused-scalar-repeated-in-node.ll
+++ b/llvm/test/Transforms/SLPVectorizer/AArch64/reused-scalar-repeated-in-node.ll
@@ -46,12 +46,12 @@ define void @test() {
; CHECK-NEXT: [[TMP16:%.*]] = phi <2 x float> [ poison, %[[BB77]] ], [ [[TMP31:%.*]], %[[BB78]] ]
; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <8 x float> [[TMP15]], <8 x float> poison, <16 x i32> <i32 0, i32 3, i32 1, i32 2, i32 3, i32 0, i32 2, i32 3, i32 2, i32 6, i32 2, i32 3, i32 0, i32 7, i32 6, i32 6>
; CHECK-NEXT: [[TMP18:%.*]] = fmul fast <16 x float> [[TMP17]], [[TMP13]]
-; CHECK-NEXT: [[TMP19:%.*]] = shufflevector <8 x float> [[TMP15]], <8 x float> poison, <16 x i32> <i32 1, i32 poison, i32 0, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 1, i32 6, i32 7, i32 7>
+; CHECK-NEXT: [[TMP19:%.*]] = shufflevector <8 x float> [[TMP15]], <8 x float> poison, <16 x i32> <i32 1, i32 poison, i32 0, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
; CHECK-NEXT: [[TMP20:%.*]] = shufflevector <2 x float> [[TMP16]], <2 x float> poison, <16 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
-; CHECK-NEXT: [[TMP21:%.*]] = shufflevector <16 x float> [[TMP19]], <16 x float> [[TMP20]], <16 x i32> <i32 0, i32 17, i32 2, i32 16, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT: [[TMP21:%.*]] = shufflevector <16 x float> [[TMP19]], <16 x float> [[TMP20]], <16 x i32> <i32 0, i32 17, i32 2, i32 16, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
; CHECK-NEXT: [[TMP22:%.*]] = shufflevector <8 x float> [[TMP15]], <8 x float> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
-; CHECK-NEXT: [[TMP23:%.*]] = shufflevector <16 x float> [[TMP21]], <16 x float> [[TMP22]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 17, i32 6, i32 7, i32 8, i32 23, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
-; CHECK-NEXT: [[TMP24:%.*]] = shufflevector <16 x float> [[TMP23]], <16 x float> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 1, i32 5, i32 3, i32 1, i32 3, i32 9, i32 3, i32 1, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT: [[TMP23:%.*]] = shufflevector <16 x float> [[TMP21]], <16 x float> [[TMP22]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 17, i32 6, i32 7, i32 8, i32 23, i32 10, i32 11, i32 12, i32 22, i32 14, i32 15>
+; CHECK-NEXT: [[TMP24:%.*]] = shufflevector <16 x float> [[TMP23]], <16 x float> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 1, i32 5, i32 3, i32 1, i32 3, i32 9, i32 3, i32 1, i32 5, i32 13, i32 9, i32 9>
; CHECK-NEXT: [[TMP25:%.*]] = call <16 x float> @llvm.vector.insert.v16f32.v2f32(<16 x float> [[TMP14]], <2 x float> [[TMP0]], i64 2)
; CHECK-NEXT: [[TMP26:%.*]] = fmul fast <16 x float> [[TMP24]], [[TMP25]]
; CHECK-NEXT: [[TMP27:%.*]] = fadd fast <16 x float> [[TMP26]], [[TMP18]]
diff --git a/llvm/test/Transforms/SLPVectorizer/RISCV/complex-loads.ll b/llvm/test/Transforms/SLPVectorizer/RISCV/complex-loads.ll
index 912d60d0cc3867..f57e5de07807e6 100644
--- a/llvm/test/Transforms/SLPVectorizer/RISCV/complex-loads.ll
+++ b/llvm/test/Transforms/SLPVectorizer/RISCV/complex-loads.ll
@@ -28,13 +28,9 @@ define i32 @test(ptr %pix1, ptr %pix2, i64 %idx.ext, i64 %idx.ext63, ptr %add.pt
; CHECK-NEXT: [[ADD_PTR64_1:%.*]] = getelementptr i8, ptr [[ADD_PTR64]], i64 [[IDX_EXT63]]
; CHECK-NEXT: [[ARRAYIDX3_2:%.*]] = getelementptr i8, ptr [[ADD_PTR_1]], i64 4
; CHECK-NEXT: [[ARRAYIDX5_2:%.*]] = getelementptr i8, ptr [[ADD_PTR64_1]], i64 4
-; CHECK-NEXT: [[ARRAYIDX8_2:%.*]] = getelementptr i8, ptr [[ADD_PTR_1]], i64 1
; CHECK-NEXT: [[TMP4:%.*]] = load <4 x i8>, ptr [[ADD_PTR_1]], align 1
-; CHECK-NEXT: [[TMP33:%.*]] = load i8, ptr [[ARRAYIDX8_2]], align 1
-; CHECK-NEXT: [[TMP29:%.*]] = load i8, ptr [[ADD_PTR_1]], align 1
; CHECK-NEXT: [[TMP19:%.*]] = shufflevector <4 x i8> [[TMP4]], <4 x i8> poison, <2 x i32> <i32 0, i32 2>
; CHECK-NEXT: [[TMP21:%.*]] = zext <2 x i8> [[TMP19]] to <2 x i32>
-; CHECK-NEXT: [[TMP84:%.*]] = zext i8 [[TMP29]] to i32
; CHECK-NEXT: [[TMP9:%.*]] = load <4 x i8>, ptr [[ADD_PTR64_1]], align 1
; CHECK-NEXT: [[TMP22:%.*]] = shufflevector <4 x i8> [[TMP9]], <4 x i8> poison, <2 x i32> <i32 0, i32 2>
; CHECK-NEXT: [[TMP31:%.*]] = zext <2 x i8> [[TMP22]] to <2 x i32>
@@ -50,7 +46,6 @@ define i32 @test(ptr %pix1, ptr %pix2, i64 %idx.ext, i64 %idx.ext63, ptr %add.pt
; CHECK-NEXT: [[TMP30:%.*]] = add <2 x i32> [[TMP25]], [[TMP23]]
; CHECK-NEXT: [[TMP32:%.*]] = shufflevector <4 x i8> [[TMP4]], <4 x i8> poison, <2 x i32> <i32 1, i32 3>
; CHECK-NEXT: [[TMP51:%.*]] = zext <2 x i8> [[TMP32]] to <2 x i32>
-; CHECK-NEXT: [[TMP83:%.*]] = zext i8 [[TMP33]] to i32
; CHECK-NEXT: [[TMP56:%.*]] = shufflevector <4 x i8> [[TMP9]], <4 x i8> poison, <2 x i32> <i32 1, i32 3>
; CHECK-NEXT: [[TMP57:%.*]] = zext <2 x i8> [[TMP56]] to <2 x i32>
; CHECK-NEXT: [[TMP35:%.*]] = sub <2 x i32> [[TMP51]], [[TMP57]]
@@ -61,246 +56,203 @@ define i32 @test(ptr %pix1, ptr %pix2, i64 %idx.ext, i64 %idx.ext63, ptr %add.pt
; CHECK-NEXT: [[TMP36:%.*]] = sub <2 x i32> [[TMP39]], [[TMP61]]
; CHECK-NEXT: [[TMP37:%.*]] = shl <2 x i32> [[TMP36]], splat (i32 16)
; CHECK-NEXT: [[TMP42:%.*]] = add <2 x i32> [[TMP37]], [[TMP35]]
-; CHECK-NEXT: [[TMP43:%.*]] = add <2 x i32> [[TMP42]], [[TMP30]]
; CHECK-NEXT: [[TMP44:%.*]] = sub <2 x i32> [[TMP30]], [[TMP42]]
-; CHECK-NEXT: [[TMP73:%.*]] = extractelement <2 x i32> [[TMP43]], i32 0
-; CHECK-NEXT: [[TMP34:%.*]] = extractelement <2 x i32> [[TMP43]], i32 1
-; CHECK-NEXT: [[ADD48_2:%.*]] = add i32 [[TMP34]], [[TMP73]]
-; CHECK-NEXT: [[TMP47:%.*]] = extractelement <2 x i32> [[TMP44]], i32 0
-; CHECK-NEXT: [[TMP48:%.*]] = extractelement <2 x i32> [[TMP44]], i32 1
-; CHECK-NEXT: [[ADD55_2:%.*]] = add i32 [[TMP48]], [[TMP47]]
; CHECK-NEXT: [[ARRAYIDX5_3:%.*]] = getelementptr i8, ptr null, i64 4
+; CHECK-NEXT: [[ARRAYIDX22_3:%.*]] = getelementptr i8, ptr null, i64 2
+; CHECK-NEXT: [[ARRAYIDX27_3:%.*]] = getelementptr i8, ptr null, i64 6
+; CHECK-NEXT: [[TMP33:%.*]] = load i8, ptr [[ARRAYIDX27_3]], align 1
+; CHECK-NEXT: [[ARRAYIDX34_3:%.*]] = getelementptr i8, ptr null, i64 3
+; CHECK-NEXT: [[TMP34:%.*]] = load i8, ptr [[ARRAYIDX34_3]], align 1
+; CHECK-NEXT: [[TMP43:%.*]] = load i8, ptr null, align 1
+; CHECK-NEXT: [[ARRAYIDX39_3:%.*]] = getelementptr i8, ptr null, i64 7
+; CHECK-NEXT: [[TMP47:%.*]] = load i8, ptr [[ARRAYIDX39_3]], align 1
; CHECK-NEXT: [[TMP53:%.*]] = load <2 x i8>, ptr null, align 1
+; CHECK-NEXT: [[TMP48:%.*]] = load <2 x i8>, ptr [[ARRAYIDX5_3]], align 1
+; CHECK-NEXT: [[TMP60:%.*]] = load <4 x i8>, ptr null, align 1
; CHECK-NEXT: [[TMP52:%.*]] = load i8, ptr null, align 1
-; CHECK-NEXT: [[TMP62:%.*]] = zext <2 x i8> [[TMP53]] to <2 x i32>
+; CHECK-NEXT: [[TMP72:%.*]] = shufflevector <4 x i8> [[TMP60]], <4 x i8> poison, <2 x i32> <i32 2, i32 poison>
+; CHECK-NEXT: [[TMP78:%.*]] = insertelement <2 x i8> [[TMP72]], i8 [[TMP52]], i32 1
+; CHECK-NEXT: [[TMP62:%.*]] = zext <2 x i8> [[TMP78]] to <2 x i32>
; CHECK-NEXT: [[TMP77:%.*]] = zext i8 [[TMP52]] to i32
-; CHECK-NEXT: [[TMP54:%.*]] = load <2 x i8>, ptr null, align 1
+; CHECK-NEXT: [[TMP80:%.*]] = insertelement <2 x ptr> <ptr poison, ptr null>, ptr [[ARRAYIDX22_3]], i32 0
+; CHECK-NEXT: [[TMP54:%.*]] = call <2 x i8> @llvm.masked.gather.v2i8.v2p0(<2 x ptr> [[TMP80]], i32 1, <2 x i1> splat (i1 true), <2 x i8> poison)
; CHECK-NEXT: [[TMP55:%.*]] = zext <2 x i8> [[TMP54]] to <2 x i32>
; CHECK-NEXT: [[TMP59:%.*]] = sub <2 x i32> [[TMP62]], [[TMP55]]
; CHECK-NEXT: [[TMP41:%.*]] = call <2 x i8> @llvm.experimental.vp.strided.load.v2i8.p0.i64(ptr align 1 null, i64 4, <2 x i1> splat (i1 true), i32 2)
; CHECK-NEXT: [[TMP58:%.*]] = zext <2 x i8> [[TMP41]] to <2 x i32>
-; CHECK-NEXT: [[TMP60:%.*]] = shufflevector <2 x i32> [[TMP58]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
-; CHECK-NEXT: [[TMP63:%.*]] = load <2 x i8>, ptr [[ARRAYIDX5_3]], align 1
+; CHECK-NEXT: [[TMP83:%.*]] = shufflevector <2 x i8> [[TMP48]], <2 x i8> poison, <2 x i32> <i32 poison, i32 0>
+; CHECK-NEXT: [[TMP63:%.*]] = insertelement <2 x i8> [[TMP83]], i8 [[TMP33]], i32 0
; CHECK-NEXT: [[TMP76:%.*]] = zext <2 x i8> [[TMP63]] to <2 x i32>
-; CHECK-NEXT: [[TMP45:%.*]] = sub <2 x i32> [[TMP60]], [[TMP76]]
-; CHECK-NEXT: [[TMP46:%.*]] = shl <2 x i32> [[TMP45]], splat (i32 16)
-; CHECK-NEXT: [[TMP90:%.*]] = add <2 x i32> [[TMP46]], [[TMP59]]
-; CHECK-NEXT: [[ARRAYIDX20_3:%.*]] = getelementptr i8, ptr null, i64 2
-; CHECK-NEXT: [[ARRAYIDX22_3:%.*]] = getelementptr i8, ptr null, i64 2
-; CHECK-NEXT: [[ARRAYIDX27_3:%.*]] = getelementptr i8, ptr null, i64 6
-; CHECK-NEXT: [[TMP64:%.*]] = load <2 x i8>, ptr [[ARRAYIDX20_3]], align 1
+; CHECK-NEXT: [[TMP81:%.*]] = sub <2 x i32> [[TMP58]], [[TMP76]]
+; CHECK-NEXT: [[TMP167:%.*]] = shl <2 x i32> [[TMP81]], splat (i32 16)
+; CHECK-NEXT: [[TMP168:%.*]] = add <2 x i32> [[TMP167]], [[TMP59]]
+; CHECK-NEXT: [[TMP64:%.*]] = shufflevector <4 x i8> [[TMP60]], <4 x i8> poison, <2 x i32> <i32 3, i32 1>
; CHECK-NEXT: [[TMP79:%.*]] = zext <2 x i8> [[TMP64]] to <2 x i32>
-; CHECK-NEXT: [[TMP82:%.*]] = load <2 x i8>, ptr [[ARRAYIDX22_3]], align 1
+; CHECK-NEXT: [[TMP82:%.*]] = insertelement <2 x i8> [[TMP53]], i8 [[TMP34]], i32 0
; CHECK-NEXT: [[TMP91:%.*]] = zext <2 x i8> [[TMP82]] to <2 x i32>
; CHECK-NEXT: [[TMP65:%.*]] = sub <2 x i32> [[TMP79]], [[TMP91]]
-; CHECK-NEXT: [[TMP75:%.*]] = call <2 x i8> @llvm.masked.gather.v2i8.v2p0(<2 x ptr> zeroinitializer, i32 1, <2 x i1> splat (i1 true), <2 x i8> poison)
-; CHECK-NEXT: [[TMP98:%.*]] = zext <2 x i8> [[TMP75]] to <2 x i32>
-; CHECK-NEXT: [[TMP100:%.*]] = load <2 x i8>, ptr [[ARRAYIDX27_3]], align 1
-; CHECK-NEXT: [[TMP103:%.*]] = zext <2 x i8> [[TMP100]] to <2 x i32>
-; CHECK-NEXT: [[TMP69:%.*]] = sub <2 x i32> [[TMP98]], ...
[truncated]
|
Ping! |
1 similar comment
Ping! |
Created using spr 1.3.5
Adds cost estimation for the variants of the permutations of the scalar
values, used in gather nodes. Currently, SLP just unconditionally emits
shuffles for the reused buildvectors, but in some cases better to leave
them as buildvectors rather than shuffles, if the cost of such
buildvectors is better.
X86, AVX512, -O3+LTO
Metric: size..text
Program size..text
results results0 diff
test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 912998.00 913238.00 0.0%
test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 203070.00 203102.00 0.0%
test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1396320.00 1396448.00 0.0%
test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1396320.00 1396448.00 0.0%
test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 309790.00 309678.00 -0.0%
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12477607.00 12470807.00 -0.1%
CINT2006/445.gobmk - extra code vectorized
MiBench/consumer-lame - small variations
CFP2017speed/638.imagick_s
CFP2017rate/538.imagick_r - extra vectorized code
Benchmarks/Bullet - extra code vectorized
CFP2017rate/526.blender_r - extra vector code
RISC-V, sifive-p670, -O3+LTO
CFP2006/433.milc - regressions, should be fixed by #115173
CFP2006/453.povray - extra vectorized code
CFP2017rate/508.namd_r - better vector code
CFP2017rate/510.parest_r - extra vectorized code
SPEC/CFP2017rate - extra/better vector code
CFP2017rate/526.blender_r - extra vectorized code
CFP2017rate/538.imagick_r - extra vectorized code
CINT2006/403.gcc - extra vectorized code
CINT2006/445.gobmk - extra vectorized code
CINT2006/464.h264ref - extra vectorized code
CINT2006/483.xalancbmk - small variations
CINT2017rate/525.x264_r - better vectorization