-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for llvm.vector.reduce.* intrinsics #2198
Conversation
42175d3
to
a89ef46
Compare
@@ -4023,6 +4024,90 @@ SPIRVValue *LLVMToSPIRVBase::transIntrinsicInst(IntrinsicInst *II, | |||
transValue(II->getArgOperand(0), BB), | |||
transValue(II->getArgOperand(1), BB), BB); | |||
} | |||
case Intrinsic::vector_reduce_add: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some details in a comment about the implementation here.
Thanks
|
||
target triple = "spir64-unknown-unknown" | ||
|
||
; CHECK-SPIRV-DAG: TypeInt [[#I8TYPE:]] 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do appreciate the thoroughness of this test. But I am not sure if it is really necessary to check for all data types here. If the implementation passes/fails for a particular data type, is there a reason why it will have a different result for some other data type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO there is. The major necessity here comes from the fact that translator during translation is a stateful machine. This actually requires from the test to contain at least some kind of combination of types to be sure that at least in some ways this stateful behavior is covered by tests. The second reason is to be 100% sure that the types aren't changing of the implementation behavior. And third (imo. weakest) is that the api calls which are used by implementation are also tested for correctness in that context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about correctness as we are not actually running anything and creating results in the LIT tests. My argument is this. LIT tests that are being added here test specifically the sequence of SPIR-V instructions generated by 'your' implementation and if your implementation breaks one test that checks SPIR-V code generation for one type, it is going to break for all types.
Also, I am ok to have this discussion offline and not block this PR.
Thanks
@@ -0,0 +1,552 @@ | |||
; RUN: llvm-as %s -o %t.bc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the amount of effort that has already gone into writing this test, I will categorize this comment as 'feel free to ignore' and 'non-blocking'. Do we want to test the reverse translated LLVM code here?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO. no. These are covered by appropriate instruction tests written in the past - such effort will be redundant when such tests are in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I do not think we have instruction level tests anywhere. Please feel free to point me to such tests.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits. Overall change looks good. If possible, it will be a good idea to verify the functionality using E2E tests though it is not required for this PR to be approved.
Also, for future reference, I see two places where the complexity of the testing can be reduced: (1) I think it's sufficient to test for a single data type. If the implementation failed in one data type, i suspect it's going to fail for all (2) I also think testing for add/and/mul/or/xor can be represented using a single test as the implementation is same for all these ops. Similarly for smax/smin/umax/umin.
Good job on writing the tests.
Thanks
This PR is still in Draft state. Sorry, i jumped the gun by approving. Will wait for it to move out of 'draft' state.
A set of llvm.vector.reduce.* intrinsics doesn't have straight forward operation equivalent on the SPIRV side. The easiest solution to this problem is to use scalar operation on each pair of vector elements and repeat until there is only one value.
a89ef46
to
aec34ac
Compare
b6a4f10
to
908f32a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integer add/mul, or/and/xor and float add/mul are LGTM. Tomorrow will take a look at min/max
A set of llvm.vector.reduce.* intrinsics doesn't have straight forward operation equivalent on the SPIRV side. The commit needed be split on three sections: Integer, Floating-point and Comparison reductions.
On every category at first we need to extract each value from vector.
For Integer reductions we perform an operation on each pair of vector elements and repeat until there is only one value.
This way emitted code has the least number of dependencies between each operation and maximally utilizes instruction parallelism.
For FP reductions because of non associativity (due to rounding errors) we needed to emit sequential code in which every next operation is based on previous result. This solution is slow but correct. Potential way of improvement Floating-point is to use reassoc flag from -ffast-math.
For Comparison reductions we can utilize similar algorithm like in Integer section with the difference that now we need to emit Comp operation and OpSelect on which result is based like in cpp ternary operation. The rest is the same.