Any suggestion about gradient accumulation and BatchNorm? #1847

tangjiasheng · 2022-03-21T10:57:17Z

tangjiasheng
Mar 21, 2022

So if I directly use deepspeed on regular CNN model with BN, is there any suggestion about the usage?
e.g. Using gradient accumulation with large batch size thus to avoid BN unstable, is it right? Or, there's no need to address.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any suggestion about gradient accumulation and BatchNorm? #1847

{{title}}

Replies: 0 comments

Select a reply

Any suggestion about gradient accumulation and BatchNorm? #1847

tangjiasheng Mar 21, 2022

Replies: 0 comments

tangjiasheng
Mar 21, 2022