Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot train the model #13

Open
deepu-rajesh opened this issue Mar 10, 2024 · 2 comments
Open

Cannot train the model #13

deepu-rajesh opened this issue Mar 10, 2024 · 2 comments

Comments

@deepu-rajesh
Copy link

got the following error:
2024-03-10 10:12:48.171105: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-10 10:12:48.171155: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-10 10:12:48.172506: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-10 10:12:49.612674: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[**] create folder ../experiments/aotgan_places2_pconv256
Traceback (most recent call last):
File "/content/drive/MyDrive/CODES/AOTGAN/src/train.py", line 51, in
main_worker(0, 1, args)
File "/content/drive/MyDrive/CODES/AOTGAN/src/train.py", line 30, in main_worker
trainer = Trainer(args)
File "/content/drive/MyDrive/CODES/AOTGAN/src/trainer/trainer.py", line 23, in init
self.dataloader = create_loader(args)
File "/content/drive/MyDrive/CODES/AOTGAN/src/data/init.py", line 14, in create_loader
data_loader = DataLoader(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 349, in init
sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/sampler.py", line 140, in init
raise ValueError(f"num_samples should be a positive integer value, but got num_samples={self.num_samples}")
ValueError: num_samples should be a positive integer value, but got num_samples=0

@Muyangjiadebianmu
Copy link

Have you resolved the problem? I am struggling with this.

@minhanh29
Copy link

minhanh29 commented Mar 31, 2024

Hi, you have to specify the --data_train to point to the directories storing your images and put your masks into "pconv" folder.
(I know, the naming is confusing). Below is how the data is loaded.
https://github.com/researchmm/AOT-GAN-for-Inpainting/blob/418034627392289bdfc118d62bc49e6abd3bb185/src/data/dataset.py#L21C2-L24C84

self.image_path = []
for ext in ['*.jpg', '*.png']: 
    self.image_path.extend(glob(os.path.join(args.dir_image, args.data_train, ext)))
self.mask_path = glob(os.path.join(args.dir_mask, args.mask_type, '*.png'))

https://github.com/researchmm/AOT-GAN-for-Inpainting/blob/418034627392289bdfc118d62bc49e6abd3bb185/src/data/dataset.py#L48C1-L55C51

if self.mask_type == 'pconv':
    index = np.random.randint(0, len(self.mask_path))
    mask = Image.open(self.mask_path[index])
    mask = mask.convert('L')
else:
    mask = np.zeros((self.h, self.w)).astype(np.uint8)
    mask[self.h//4:self.h//4*3, self.w//4:self.w//4*3] = 1
    mask = Image.fromarray(m).convert('L')

For example, if you put the images into ./data/images and the masks into ./data/pconv, the expected command would be:

python train.py --dir_image ./data --dir_mask ./data --data_train images --mask_type pconv --image_size 256 --save_every 5000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants