[QUESTION] What's the internal difference for training when setting only "fp8-format" or setting "fp8-format"+"bf16" #1116

dong-liuliu · 2024-06-21T09:25:11Z

dong-liuliu
Jun 21, 2024

Your question
I'm trying to train GPT/LLAMA on top of Megatron-LM, but confused on fp8 performance.

Setting fp8 format parameters together with "--bf16" is much better than the situation without "--bf16". So what's difference between them inside Megatron-LM?

When setting fp8 + bf16, will Megatron-LM try to split some computation to bf16 if more efficient, or to fp8 for high throughtput?

yaox12 · 2024-07-03T10:00:11Z

yaox12
Jul 3, 2024

Only "fp8-format": FP32 + FP8
"fp8-format"+"bf16": BF16 + FP8

You could consider FP8 as an additional feature on top of your current (BF16 or no BF16) training recipe.

2 replies

yanchenmochen Oct 22, 2024

when I use this fp8 paramaters , same parameter fp16 can run successfully, but fp8 will fail becaue OOM.

yanchenmochen Oct 22, 2024

do you know what parameter I can try. torch.OutOfMemoryError。

2024-09-01T18:20:47Z

github-actions[bot]
bot Sep 1, 2024

Marking as stale. No activity in 60 days.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] What's the internal difference for training when setting only "fp8-format" or setting "fp8-format"+"bf16" #1116

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[QUESTION] What's the internal difference for training when setting only "fp8-format" or setting "fp8-format"+"bf16" #1116

dong-liuliu Jun 21, 2024

Replies: 2 comments · 2 replies

yaox12 Jul 3, 2024

yanchenmochen Oct 22, 2024

yanchenmochen Oct 22, 2024

github-actions[bot] bot Sep 1, 2024

dong-liuliu
Jun 21, 2024

Replies: 2 comments 2 replies

yaox12
Jul 3, 2024

github-actions[bot]
bot Sep 1, 2024