Loss becomes NaN when setting use_global_stat=True for batchnorm #13902
-
DescriptionI trained a model and used it to perform prediction. While building the predictor, if I set the argument for_training=False, the prediction result is bad, as bad as predicted using a randomly initialized model. Environment info (Required)
Package used (Python/R/Scala/Julia): Build info (Required if built from source)Compiler (gcc/clang/mingw/visual studio): Build config: Error Message:the training log:
Minimum reproducible example(If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.)
Steps to reproduce(Paste the commands you ran that produced the error.)
What's wrong with my code? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
@FCInter Thank you for submitting the issue! I'm labeling it so the MXNet community members can help resolve it. |
Beta Was this translation helpful? Give feedback.
-
You can try to lower the learning rate to 1/10, 1/100 of the orginal value since when self.use_global_stats is True the model is not strictly zero centered and 1 variance, training is more difficult. |
Beta Was this translation helpful? Give feedback.
You can try to lower the learning rate to 1/10, 1/100 of the orginal value since when self.use_global_stats is True the model is not strictly zero centered and 1 variance, training is more difficult.