Is the delayed scaling computation overlapped with another computation? #1231
Unanswered
avizon-aws
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In order for the delayed scaling recipe to be efficient, the scales for the next iteration are computed before hand. I think the part for the computation of the scales is here:
TransformerEngine/transformer_engine/pytorch/fp8.py
Line 314 in f8eb799
But I dont understand if this computation is done in parallel with another computation to save time and for efficiency. Could anyone throw some light on this?Also where is the amax calculation for the input and weights done?
Beta Was this translation helpful? Give feedback.
All reactions