'out' option in 'contract' makes a copy #209

wxj6000 · 2022-11-24T03:32:50Z

When 'out' is given, 'contract' is supposed to write the result to 'out' directly. However, in the current version, if 'out' is given, opt_einsum will make copies twice! Here are my sample code and the profiling result.

a = cupy.random.rand(2100,6,400)
out = cupy.random.rand(2100,5,400)
c2s = cupy.random.rand(6,5)
contract('ijk,jp->ipk', a, c2s, out=out)

==6876== Profiling result:

GPU activities:
Type Time(%) Time Calls Avg Min Max Name
51.39% 1.96283s 20000 98.141us 88.831us 112.29us cupy_copy__float64_float64
48.57% 1.85507s 10000 185.51us 183.07us 192.03us void gemmSN_NN_kernel<double, int=128, int=2, int=4, int=8, int=5, int=4, cublasGemvTensorStridedBatched, cublasGemvTensorStridedBatched>(cublasGemmSmallNParams<double const , cublasGemvTensorStridedBatched, double>)

The text was updated successfully, but these errors were encountered:

dgasmith · 2022-11-24T13:06:34Z

It is likely the case that the first copy is a transpose as the GEMM zip index isn't aligned. Please try profiling the following contraction with an aligned zip index:

a = cupy.random.rand(2100, 400, 6)
out = cupy.random.rand(2100, 400, 5)
c2s = cupy.random.rand(6, 5)
contract('ijk,kp->ijp', a, c2s, out=out)

wxj6000 · 2022-12-26T07:14:29Z

Sorry for the late reply. I missed the notification. The above code indeed reduces one of the cupy copies

==1850== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 77.46% 317.16ms 1000 317.16us 315.68us 330.75us volta_dgemm_128x64_tt
22.40% 91.722ms 1000 91.722us 90.528us 93.727us cupy_copy__float64_float64

But there is still a copy operation. It is possible to avoid the operation by writing the data to 'out' directly, right? Am I missing something?

No copy operation is needed if I contract via
out=contract('ijk,kp->ijp', a, c2s)

==1988== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 99.82% 317.26ms 1000 317.26us 315.61us 330.62us volta_dgemm_128x64_tt
0.07% 209.41us 1 209.41us 209.41us 209.41us void generate_seed_pseudo<rng_config<curandStateXORWOW, curandOrdering=101>>(__int64, __int64, __int64, curandOrdering, curandStateXORWOW*, unsigned int*)

But the new issue is that the 'out.flags' becomes

C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False
The difference between the out in the two cases contract('ijk,kp->ijp', a, c2s, out=out) and out=contract('ijk,kp->ijp', a, c2s) is zero.

dgasmith · 2023-01-04T00:37:38Z

In general, only BLAS-like interfaces are implemented in ML code bases which assumes aligned GEMM operations like ij,jk->ik. Anytime you do not have an aligned GEMM (zip index is contiguous in memory) a copy or a for loop is required. out is generally as fast as not specifying out as the time it takes to allocate an array is generally much less than the computational time.

opt_einsum for this binary contraction will simply call cupy.tensordot.

wxj6000 · 2023-01-04T01:19:20Z

I see. Cupy can be powered by cuTENSOR, which is not a standard BLAS-like interface. The copy operation can probably be avoided there.

Actually, the 'copy' operation is pretty slow on GPU. For example, the following contraction

import cupy
import opt_einsum
a = cupy.random.rand(800, 800, 800)
b = cupy.random.rand(800, 800)
c = cupy.zeros([800, 800, 800])

contract('ijk,kl->ijl', a, b) takes 48 ms, while contract('ijk,kl->ijl', a, b, out=c) takes 75 ms on A100.

The other issue is also mentioned in another issue.
#211

jcmgray · 2023-01-06T15:06:48Z

I think a lot of what I said here #211 (comment), also applies to the 'out' kwargs - essentially there is not a backend agnostic way to handle it.

Maybe for these types of lower level functionality/optimization, one would want to extract the contraction path / specification and write a custom kernel.

wxj6000 · 2023-01-07T05:04:38Z

I see. Those kwargs indeed are out of opt_einsum's scope. Thank you for clarifying.

jcmgray mentioned this issue Jan 6, 2023

Contigugous loss after opt-einsum that may affect the efficiency of subsequent operations #211

Open

wxj6000 closed this as completed Jan 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'out' option in 'contract' makes a copy #209

'out' option in 'contract' makes a copy #209

wxj6000 commented Nov 24, 2022 •

edited

Loading

dgasmith commented Nov 24, 2022

wxj6000 commented Dec 26, 2022

dgasmith commented Jan 4, 2023

wxj6000 commented Jan 4, 2023

jcmgray commented Jan 6, 2023

wxj6000 commented Jan 7, 2023

'out' option in 'contract' makes a copy #209

'out' option in 'contract' makes a copy #209

Comments

wxj6000 commented Nov 24, 2022 • edited Loading

dgasmith commented Nov 24, 2022

wxj6000 commented Dec 26, 2022

dgasmith commented Jan 4, 2023

wxj6000 commented Jan 4, 2023

jcmgray commented Jan 6, 2023

wxj6000 commented Jan 7, 2023

wxj6000 commented Nov 24, 2022 •

edited

Loading