-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt_einsum.contract is slower than raw numpy.einsum #234
Comments
Hi @lin-cp, thanks for the interesting example. For what its worth I get a much more minor performance difference (4.5 vs 6.0 seconds), but worse nonetheless. If you check the optimal path:
you see that there is actually no scaling advantage to the optimized path, and only a very small absolute cost decrease. This cost decrease is probably majorly offset by the time to write the memory of the intermediate - thus leading to the worse performance. In general for these very low arithmetic intensity, small contractions (ie. 3 terms) where most indices appear on all the tensors, there is little to be gained from optimizing the order of contractions (which is what |
@jcmgray Thanks for your inspiring explanation. Could you explain more what are the intermediate steps that require possibly intensive memory that worses the performance of I have two further questions:
|
The intermediate step is as listed above:
If you have a 3 term contraction and need to eke the best performance out, you probably need to benchmark things explicitly, the optimization |
Dear developer,
I am consdering to substitute every
numpy.einsum
in my code withopt_einsum.contract
. However, I found in some cases,contract
is much slower than the rawnumpy.einsum
without any path optimization.I have three tensors in complex number with the shape as follows:
A: (1166, 8000, 3, 3)
B: (8000, 6, 1166, 3)
C: (8000, 6, 1166, 3)
Then I used:
D = np.einsum("ijkl,jmik,jmil->jm", A, B, C)
and
from opt_einsum import contract
D = contract("ijkl,jmik,jmil->jm", A, B, C)
Using
contract
is about 3 to 4 times slower compared tonp.einsum
(e.g. ~58s and ~16s, respectively). According to the documentation of opt_einsum, if I don't specify the parameteroptimize
, the default isoptimize=auto
which keeps the path finding time below around 1 ms.I am trying to understand why in this case opt_einsum will perform worse than the raw einsum of numpy. Any suggestion or comment?
Best,
Changpeng
The text was updated successfully, but these errors were encountered: