Why not use nn.MultiheadAttention in vit? #283
Answered
by
rwightman
ZhiyuanChen
asked this question in
Q&A
-
It seems PyTorch have provided |
Beta Was this translation helpful? Give feedback.
Answered by
rwightman
Nov 21, 2020
Replies: 1 comment 1 reply
-
@ZhiyuanChen I wasn't quite sure how the official version would look wrt to the attention module and how close it'd be to the PyTorch impl when I started. Plus it was pretty straightforward to just implement it as it is. I don't think the current PyTorch impl s much faster. The Apex one would likely be, but it's harder to work with. |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
ZhiyuanChen
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@ZhiyuanChen I wasn't quite sure how the official version would look wrt to the attention module and how close it'd be to the PyTorch impl when I started. Plus it was pretty straightforward to just implement it as it is. I don't think the current PyTorch impl s much faster. The Apex one would likely be, but it's harder to work with.