[QUESTION] What's the internal difference for training when setting only "fp8-format" or setting "fp8-format"+"bf16" #1116
Unanswered
dong-liuliu
asked this question in
Q&A
Replies: 2 comments 2 replies
-
Only "fp8-format": FP32 + FP8 You could consider FP8 as an additional feature on top of your current (BF16 or no BF16) training recipe. |
Beta Was this translation helpful? Give feedback.
2 replies
-
Marking as stale. No activity in 60 days. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Your question
I'm trying to train GPT/LLAMA on top of Megatron-LM, but confused on fp8 performance.
Setting fp8 format parameters together with "--bf16" is much better than the situation without "--bf16". So what's difference between them inside Megatron-LM?
When setting fp8 + bf16, will Megatron-LM try to split some computation to bf16 if more efficient, or to fp8 for high throughtput?
Beta Was this translation helpful? Give feedback.
All reactions