-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'out' option in 'contract' makes a copy #209
Comments
It is likely the case that the first copy is a transpose as the GEMM zip index isn't aligned. Please try profiling the following contraction with an aligned zip index: a = cupy.random.rand(2100, 400, 6)
out = cupy.random.rand(2100, 400, 5)
c2s = cupy.random.rand(6, 5)
contract('ijk,kp->ijp', a, c2s, out=out) |
Sorry for the late reply. I missed the notification. The above code indeed reduces one of the cupy copies
But there is still a copy operation. It is possible to avoid the operation by writing the data to 'out' directly, right? Am I missing something? No copy operation is needed if I contract via
But the new issue is that the 'out.flags' becomes
|
In general, only BLAS-like interfaces are implemented in ML code bases which assumes aligned GEMM operations like
|
I see. Cupy can be powered by cuTENSOR, which is not a standard BLAS-like interface. The copy operation can probably be avoided there. Actually, the 'copy' operation is pretty slow on GPU. For example, the following contraction
The other issue is also mentioned in another issue. |
I think a lot of what I said here #211 (comment), also applies to the 'out' kwargs - essentially there is not a backend agnostic way to handle it. Maybe for these types of lower level functionality/optimization, one would want to extract the contraction path / specification and write a custom kernel. |
I see. Those kwargs indeed are out of opt_einsum's scope. Thank you for clarifying. |
When 'out' is given, 'contract' is supposed to write the result to 'out' directly. However, in the current version, if 'out' is given, opt_einsum will make copies twice! Here are my sample code and the profiling result.
a = cupy.random.rand(2100,6,400)
out = cupy.random.rand(2100,5,400)
c2s = cupy.random.rand(6,5)
contract('ijk,jp->ipk', a, c2s, out=out)
==6876== Profiling result:
GPU activities:
Type Time(%) Time Calls Avg Min Max Name
51.39% 1.96283s 20000 98.141us 88.831us 112.29us cupy_copy__float64_float64
48.57% 1.85507s 10000 185.51us 183.07us 192.03us void gemmSN_NN_kernel<double, int=128, int=2, int=4, int=8, int=5, int=4, cublasGemvTensorStridedBatched, cublasGemvTensorStridedBatched>(cublasGemmSmallNParams<double const , cublasGemvTensorStridedBatched, double>)
The text was updated successfully, but these errors were encountered: