Can this module be used for training on an Ada GPU? #770
Unanswered
scissorstail
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When actually using fp8 training as bf16 on two RTX 6000, the accuracy was too low, and nan values occurred when the number of layers was just over 8. To prevent this, I had to increase the norm_eps from the original 1e-5 to 5e-2. I might be doing something wrong, but what should I check?
Beta Was this translation helpful? Give feedback.
All reactions