Need help on Micro Batch Size, Global Batch Size, Pipeline Parallel size calculation #6795

ngbala6 · 2023-06-02T08:02:52Z

ngbala6
Jun 2, 2023

I am using "https://huggingface.co/nvidia/GPT-2B-001" model. where i need to do the data & model parallelization. For that I used

config.model.micro_batch_size = 4
config.model.global_batch_size = 16

config.model.tensor_model_parallel_size = 1
config.model.pipeline_model_parallel_size = 4

Total No of gpu's = 4 (single machine with multigpu)

For that I got,

https://github.com/NVIDIA/NeMo/blob/v1.18.1/nemo/collections/nlp/modules/common/megatron/megatron_init.py
assert _GLOBAL_NUM_MICROBATCHES_CALCULATOR.num_micro_batches == global_batch_size // (
AssertionError

Can anyone help me to solve this?

Thanks

Answered by blahBlahhhJ

Aug 15, 2023

In short, TP_size * PP_size * DP_size == num_GPUs. so if you want to do PP=4, you cannot do DP.

For the equation there:

The left hand side is the number of micro batches in a global batch for one DP group, which mean the number of gradient accumulmations (the number of forward+backward before you step the optimizer).
The right hand side says global_batch_size // micro_batch_size (which means the number of micro batches for all DP group), divided by DP_size (which means the number of micro batches for one DP group).
Therefore they should be equal.

Now that you understand the equation, the issue is probably because the DP_size is wrong.

A possible reason is that you didn't partition the m…

View full answer

blahBlahhhJ · 2023-08-15T04:03:33Z

blahBlahhhJ
Aug 15, 2023
Collaborator

In short, TP_size * PP_size * DP_size == num_GPUs. so if you want to do PP=4, you cannot do DP.

For the equation there:

The left hand side is the number of micro batches in a global batch for one DP group, which mean the number of gradient accumulmations (the number of forward+backward before you step the optimizer).
The right hand side says global_batch_size // micro_batch_size (which means the number of micro batches for all DP group), divided by DP_size (which means the number of micro batches for one DP group).
Therefore they should be equal.

Now that you understand the equation, the issue is probably because the DP_size is wrong.

A possible reason is that you didn't partition the model into PP=4. so the program uses DP=4. Simply setting PP=4 in the config wouldn't work, you need to use examples/nlp/language_modeling/megatron_change_num_partitions.py to partition it first.

3 replies

ashvinnihalani Nov 9, 2023

How do you calculate the data_parallel size? Right now I think I use num of data_workers * num_nodes

ericharper Nov 10, 2023
Maintainer

data_parallel_size = num_gpus / model_parallel_size. Also why are you setting pipeline parallel to 4 for that model? It's usually set to 1.

ashvinnihalani Nov 10, 2023

So if I want to set my global batch size so that I don't do any gradient accumulation it should be microbatchsize * model_parralelism * dp? In other words global batch size should just be micro batch size time number of Hous.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need help on Micro Batch Size, Global Batch Size, Pipeline Parallel size calculation #6795

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Need help on Micro Batch Size, Global Batch Size, Pipeline Parallel size calculation #6795

ngbala6 Jun 2, 2023

Replies: 1 comment · 3 replies

blahBlahhhJ Aug 15, 2023 Collaborator

ashvinnihalani Nov 9, 2023

ericharper Nov 10, 2023 Maintainer

ashvinnihalani Nov 10, 2023

ngbala6
Jun 2, 2023

Replies: 1 comment 3 replies

blahBlahhhJ
Aug 15, 2023
Collaborator

ericharper Nov 10, 2023
Maintainer