-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training using Nvidia A100 GPU #114
Comments
@ajaysurya1221 A100 has Ampere architecture with so-called computational capabilities of sm_8x. Some parts of the cuda computations are not running the way they should under cu101 as required in PICK's implementation. You can try different Pytorch versions with different cuda (e.g., cu111). |
Hi, i'm using one A100 GPU to train PICK and i've set distributed to false.
[2022-06-08 01:41:58,561 - train - INFO] - One GPU or CPU training mode start...
[2022-06-08 01:41:58,565 - train - INFO] - Dataloader instances created. Train datasets: 100 samples Validation datasets: 20 samples.
[2022-06-08 01:41:59,276 - train - INFO] - Model created, trainable parameters: 68571598.
[2022-06-08 01:41:59,277 - train - INFO] - Optimizer and lr_scheduler created.
[2022-06-08 01:41:59,277 - train - INFO] - Max_epochs: 35 Log_per_step: 20 Validation_per_step: 100.
[2022-06-08 01:41:59,277 - train - INFO] - Training start...
[2022-06-08 01:41:59,289 - trainer - WARNING] - Training is using GPU 0!
I've been struck here for so long and after 10-15 mins, it throws CuDNN error. any solution?
cuda version = 10.1 and pythorch = 1.5.1+101
The text was updated successfully, but these errors were encountered: