You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hey, thanks for your excellent work, I'm currently following the open-sourced code and encountering a few questions about the training procedure:
I pull down the latest code from GitHub and run the stage1 training code on Imagenet from scratch on a 8-GPU A100 machine, but the training log seems abnormal. The recon-loss seems diverge and the visualization results turns bad. (See the appendix image in email)
The train code uses '-num_nodes 4', what does this hparams mean ?
The default train code saves checkpoints every n step, rather than topK 'val/recon_loss', should I use the topK checkpoints callback function?
training:
validation:
The text was updated successfully, but these errors were encountered:
hey, thanks for your excellent work, I'm currently following the open-sourced code and encountering a few questions about the training procedure:
I pull down the latest code from GitHub and run the stage1 training code on Imagenet from scratch on a 8-GPU A100 machine, but the training log seems abnormal. The recon-loss seems diverge and the visualization results turns bad. (See the appendix image in email)
The train code uses '-num_nodes 4', what does this hparams mean ?
The default train code saves checkpoints every n step, rather than topK 'val/recon_loss', should I use the topK checkpoints callback function?
training:
validation:
The text was updated successfully, but these errors were encountered: