Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Looking for help while debugging #2

Open
hhhljf opened this issue Sep 28, 2023 · 3 comments
Open

Looking for help while debugging #2

hhhljf opened this issue Sep 28, 2023 · 3 comments

Comments

@hhhljf
Copy link

hhhljf commented Sep 28, 2023

Thanks for Ur novel work. I am trying to train the model on Moya dataset. I followed the command of traing prived by you. When the runing the python file of models.networks, I met a error warning as follows:
Traceback (most recent call last):
File "/dg_hpc/CNG/lijf/ASCON-main/train.py", line 72, in
model.optimize_parameters() # calculate loss functions, get gradients, update network weights
File "/dg_hpc/CNG/lijf/ASCON-main/models/ASCON_model.py", line 128, in optimize_parameters
self.loss_D = self.compute_D_loss()
File "/dg_hpc/CNG/lijf/ASCON-main/models/ASCON_model.py", line 160, in compute_D_loss
self.loss_D = self.MAC_Net(self.real_B, self.fake_B.detach())
File "/dg_hpc/CNG/lijf/ASCON-main/models/ASCON_model.py", line 189, in MAC_Net
feat_k_pool_1, sample_ids, sample_local_ids, sample_top_idxs = self.netProjection_target(patch_size,feat_k_1, self.num_patches,None,None,None,pixweght=None)
File "/dg_workfs/CNG/lijf/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/dg_workfs/CNG/lijf/miniconda3/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/dg_workfs/CNG/lijf/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/dg_hpc/CNG/lijf/ASCON-main/models/networks.py", line 130, in forward
N_patches=num_patches[feat_id]
IndexError: list index out of range
I have checked the value of num_patches. num_patches seems should be a array. However, I find num_patches is a predefined int value of 256. I wonder how to eliminate the error. Thanks!

@hao1635
Copy link
Owner

hao1635 commented Oct 1, 2023

Thanks for Ur novel work. I am trying to train the model on Moya dataset. I followed the command of traing prived by you. When the runing the python file of models.networks, I met a error warning as follows: Traceback (most recent call last): File "/dg_hpc/CNG/lijf/ASCON-main/train.py", line 72, in model.optimize_parameters() # calculate loss functions, get gradients, update network weights File "/dg_hpc/CNG/lijf/ASCON-main/models/ASCON_model.py", line 128, in optimize_parameters self.loss_D = self.compute_D_loss() File "/dg_hpc/CNG/lijf/ASCON-main/models/ASCON_model.py", line 160, in compute_D_loss self.loss_D = self.MAC_Net(self.real_B, self.fake_B.detach()) File "/dg_hpc/CNG/lijf/ASCON-main/models/ASCON_model.py", line 189, in MAC_Net feat_k_pool_1, sample_ids, sample_local_ids, sample_top_idxs = self.netProjection_target(patch_size,feat_k_1, self.num_patches,None,None,None,pixweght=None) File "/dg_workfs/CNG/lijf/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/dg_workfs/CNG/lijf/miniconda3/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward return self.module(*inputs[0], **kwargs[0]) File "/dg_workfs/CNG/lijf/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/dg_hpc/CNG/lijf/ASCON-main/models/networks.py", line 130, in forward N_patches=num_patches[feat_id] IndexError: list index out of range I have checked the value of num_patches. num_patches seems should be a array. However, I find num_patches is a predefined int value of 256. I wonder how to eliminate the error. Thanks!

I'm sorry to give an incomplete training instruction. Please try: python train.py --name ASCON(experiment_name) --model ASCON --netG ESAU --dataroot /data/zhchen/Mayo2016_2d(path to images) --nce_layers 1,4 --layer_weight 1,1 --num_patches 32,512 --k_size 3,7 --lr 0.0002 --gpu_ids 6,7 --print_freq 25 --batch_size 8 --lr_policy cosine

@hhhljf
Copy link
Author

hhhljf commented Oct 7, 2023

Thanks for ur reply. Now I met another problem, during training, I find self.fake_B and self.real_B seems always the same, I subtracted these two tensors element-wise and all the elements in the resulting tensor are the same, the RMSE (root mean squared error) is also zero. However, loss_D and loss_G decrease gradually. I wonder what is the problem. The training process is as follows:

(epoch: 1, iters: 525,loss_D: 1.751378, loss_G: 0.174635,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 524/2167 [08:33<24:31, 1.12it/s]

(epoch: 1, iters: 550,loss_D: 1.712759, loss_G: 0.171654,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 549/2167 [08:56<23:45, 1.13it/s]

(epoch: 1, iters: 575,loss_D: 1.753926, loss_G: 0.174744,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 574/2167 [09:20<24:23, 1.09it/s]

(epoch: 1, iters: 600,loss_D: 1.735380, loss_G: 0.173059,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 599/2167 [09:43<22:16, 1.17it/s]

(epoch: 1, iters: 625,loss_D: 1.736408, loss_G: 0.170298,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 624/2167 [10:05<21:03, 1.22it/s]

(epoch: 1, iters: 650,loss_D: 1.700382, loss_G: 0.181855,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 649/2167 [10:30<29:49, 1.18s/it]

(epoch: 1, iters: 675,loss_D: 1.726071, loss_G: 0.174785,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 674/2167 [10:52<21:37, 1.15it/s]

(epoch: 1, iters: 700,loss_D: 1.677032, loss_G: 0.169842,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 699/2167 [11:17<22:23, 1.09it/s]

(epoch: 1, iters: 725,loss_D: 1.752134, loss_G: 0.172651,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 724/2167 [11:41<25:46, 1.07s/it]

(epoch: 1, iters: 750,loss_D: 1.749419, loss_G: 0.169668,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000) | 749/2167 [12:05<22:34, 1.05it/s]

(epoch: 1, iters: 775,loss_D: 1.707352, loss_G: 0.167989,,train_psnr: inf, train_ssim: 1.0000,train_rmse:.0.00000000)

@hhhljf
Copy link
Author

hhhljf commented Oct 7, 2023

I do not use the same training dataset as the same of yours. The training datasets is normalized to 0-1. I have recovered the data while computing SSIM rmse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants