Any suggestion about gradient accumulation and BatchNorm? #1847
Unanswered
tangjiasheng
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
So if I directly use deepspeed on regular CNN model with BN, is there any suggestion about the usage?
e.g. Using gradient accumulation with large batch size thus to avoid BN unstable, is it right? Or, there's no need to address.
Beta Was this translation helpful? Give feedback.
All reactions