Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training on custom data #5

Open
shankar-anantak opened this issue May 1, 2024 · 12 comments
Open

training on custom data #5

shankar-anantak opened this issue May 1, 2024 · 12 comments

Comments

@shankar-anantak
Copy link

Hello,

I am trying to train the model on my custom data captured from a phone camera. Pre-processed fine w/ COLMAP, however, when i try to train the model:

python3 apps/train.py --cfg config/example/test/train.yml split train
[Config] merge from parent file: config/example/test/dataset.yml
[Config] merge from parent file: config/example/test/level_of_gaussian.yml
[Config] merge from parent file: config/example/test/stage_8_4.yml
Key is not in the template: split
[Config] replace key $root
[Config] replace key $scale3d
[Config] replace key $root
[Config] replace key $scale3d
[Config] replace key $PLYNAME
[Config] replace key $PLYNAME
[Config] replace key $PLYNAME
[Config] replace key $xyz_scale
[Config] replace key $scale3d
[Config] replace key $PLYNAME
[Config] replace key $xyz_scale
[Config] replace key $max_steps
[Config] replace key $dataset
[Config] replace key $RGB_RENDER_L1_SSIM
[Config] replace key $NAIVE_STAGE
[Config] replace key $val_dataset
Using GPUs: 0
Write to output/example/test/log
[/home/dev/dev/data/2023-12-18_15.06.36/sparse/0/sparse.npz] mean: -0.276, 0.500, -1.018
[/home/dev/dev/data/2023-12-18_15.06.36/sparse/0/sparse.npz] std: 1.480, 1.085, 3.215
[/home/dev/dev/data/2023-12-18_15.06.36/sparse/0/sparse.npz] sigma=1 49864/125443
bounds: [[-1.756, -0.584, -4.233], [1.204, 1.585, 2.197]]
[/home/dev/dev/data/2023-12-18_15.06.36/sparse/0/sparse.npz] sigma=2 114402/125443
bounds: [[-3.236, -1.669, -7.449], [2.683, 2.670, 5.412]]
[/home/dev/dev/data/2023-12-18_15.06.36/sparse/0/sparse.npz] sigma=3 122545/125443
bounds: [[-4.715, -2.754, -10.664], [4.163, 3.755, 8.627]]
[/home/dev/dev/data/2023-12-18_15.06.36/sparse/0/sparse.npz] z_min: -53.713, z_max: 23.659
[Load PLY] load from ply: /home/dev/dev/data/2023-12-18_15.06.36/sparse/0/sparse.npz
[Load PLY] min: [-10.02181557 -10.31714249 -53.71318683], max: [18.95321076 24.50246681 23.65868778]
[Load PLY] scale: 0.0003, 22.6722, mean = 0.0294
[GaussianPoint] scales: [0.0003~0.0294~22.6722]
[GaussianPoint] -> scales: [0.0074~0.0255~0.1176]
>>> Code 3062 files has been copied to output/example/test/log/code_backup_20240501-190914
[ImageDataset] set scales: [1, 2, 4, 8], crop size: [-1, -1]
[ImageDataset] cache dir: /home/dev/dev/data/2023-12-18_15.06.36/cache
Traceback (most recent call last):
  File "/home/dev/dev/LoG/apps/train.py", line 157, in <module>
    main()
  File "/home/dev/dev/LoG/apps/train.py", line 130, in main
    dataset = load_object(cfg.train.dataset.module, cfg.train.dataset.args)
  File "/home/dev/dev/LoG/LoG/utils/config.py", line 61, in load_object
    obj = getattr(module, name)(**extra_args, **module_args)
  File "/home/dev/dev/LoG/LoG/dataset/colmap.py", line 159, in __init__
    centers = np.stack([-info['camera']['R'].T @ info['camera']['T'] for info in infos], axis=0)
  File "/home/dev/miniconda3/envs/LoG/lib/python3.10/site-packages/numpy/core/shape_base.py", line 445, in stack
    raise ValueError('need at least one array to stack')
ValueError: need at least one array to stack

I modified the dataset path in config/example/test/dataset.yml to match my input data path.

Are there further steps required for custom data? Your advice would be appreciated.

Thanks

@chingswy
Copy link
Member

chingswy commented May 2, 2024

This issue arises from the validation settings. You can resolve it by deleting the val: and its related properties in the config/example/test/train.yml file.

@shankar-anantak
Copy link
Author

Thank you for the response. I followed your reccomendation, deleting `val:' and its properties from the train.yml config:

parents:
  - config/example/test/dataset.yml
  - config/example/test/level_of_gaussian.yml
  - config/example/test/stage_8_4.yml

exp: output/example/test/log
gpus: [0]

log_interval: 1000
save_interval: 10_000

max_steps: 750

RGB_RENDER_L1_SSIM:
  module: LoG.render.renderer.NaiveRendererAndLoss
  args:
    use_origin_render: False
    use_randback: True

train:
  dataset: $dataset
  render: $RGB_RENDER_L1_SSIM
  stages: $NAIVE_STAGE
  init:
    method: scale_min
    dataset_state:
      scale: 4

However, i still get the same error:

(LoG) dev@instance-20240430-202938:~/dev/LoG$ python3 apps/train.py --cfg config/example/test/train.yml split train
[Config] merge from parent file: config/example/test/dataset.yml
[Config] merge from parent file: config/example/test/level_of_gaussian.yml
[Config] merge from parent file: config/example/test/stage_8_4.yml
Key is not in the template: split
[Config] replace key $root
[Config] replace key $scale3d
[Config] replace key $PLYNAME
[Config] replace key $PLYNAME
[Config] replace key $PLYNAME
[Config] replace key $xyz_scale
[Config] replace key $scale3d
[Config] replace key $PLYNAME
[Config] replace key $xyz_scale
[Config] replace key $max_steps
[Config] replace key $dataset
[Config] replace key $RGB_RENDER_L1_SSIM
[Config] replace key $NAIVE_STAGE
Using GPUs: 0
Write to output/example/test/log
[/home/dev/dev/data/2023-12-18_15.06.36/sparse/0/sparse.npz] mean: -0.276, 0.500, -1.018
[/home/dev/dev/data/2023-12-18_15.06.36/sparse/0/sparse.npz] std: 1.480, 1.085, 3.215
[/home/dev/dev/data/2023-12-18_15.06.36/sparse/0/sparse.npz] sigma=1 49864/125443
bounds: [[-1.756, -0.584, -4.233], [1.204, 1.585, 2.197]]
[/home/dev/dev/data/2023-12-18_15.06.36/sparse/0/sparse.npz] sigma=2 114402/125443
bounds: [[-3.236, -1.669, -7.449], [2.683, 2.670, 5.412]]
[/home/dev/dev/data/2023-12-18_15.06.36/sparse/0/sparse.npz] sigma=3 122545/125443
bounds: [[-4.715, -2.754, -10.664], [4.163, 3.755, 8.627]]
[/home/dev/dev/data/2023-12-18_15.06.36/sparse/0/sparse.npz] z_min: -53.713, z_max: 23.659
[Load PLY] load from ply: /home/dev/dev/data/2023-12-18_15.06.36/sparse/0/sparse.npz
[Load PLY] min: [-10.02181557 -10.31714249 -53.71318683], max: [18.95321076 24.50246681 23.65868778]
[Load PLY] scale: 0.0003, 22.6722, mean = 0.0294
[GaussianPoint] scales: [0.0003~0.0294~22.6722]
[GaussianPoint] -> scales: [0.0074~0.0255~0.1176]
>>> Code 3062 files has been copied to output/example/test/log/code_backup_20240503-161612
[ImageDataset] set scales: [1, 2, 4, 8], crop size: [-1, -1]
[ImageDataset] cache dir: /home/dev/dev/data/2023-12-18_15.06.36/cache
Traceback (most recent call last):
  File "/home/dev/dev/LoG/apps/train.py", line 157, in <module>
    main()
  File "/home/dev/dev/LoG/apps/train.py", line 130, in main
    dataset = load_object(cfg.train.dataset.module, cfg.train.dataset.args)
  File "/home/dev/dev/LoG/LoG/utils/config.py", line 61, in load_object
    obj = getattr(module, name)(**extra_args, **module_args)
  File "/home/dev/dev/LoG/LoG/dataset/colmap.py", line 159, in __init__
    centers = np.stack([-info['camera']['R'].T @ info['camera']['T'] for info in infos], axis=0)
  File "/home/dev/miniconda3/envs/LoG/lib/python3.10/site-packages/numpy/core/shape_base.py", line 445, in stack
    raise ValueError('need at least one array to stack')
ValueError: need at least one array to stack

@chingswy
Copy link
Member

chingswy commented May 3, 2024

It seems that the dataset can't find any camera parameters. Can you show you folder structure?

@shankar-anantak
Copy link
Author

shankar-anantak commented May 3, 2024

Sure,

(LoG) dev@instance-20240430-202938:~/dev/LoG$ ls
LoG  LoG.egg-info  README.md  apps  assets  config  docs  output  requirements.txt  setup.py  submodules
(LoG) dev@instance-20240430-202938:~/dev/LoG$ ls ../data/2023-12-18_15.06.36/sparse/0/
cameras.bin  extri.yml  images.bin  intri.yml  points3D.bin  project.ini  sparse.npz  sparse.ply
(LoG) dev@instance-20240430-202938:~/dev/LoG$ 

My LoG root: /home/dev/dev/LoG
My COLMAP output: /home/dev/dev/data/2023-12-18_15.06.36/sparse/0

Paths in the dataset.yml:

root: /home/dev/dev/data/2023-12-18_15.06.36
PLYNAME: /home/dev/dev/data/2023-12-18_15.06.36/sparse/0/sparse.npz

Your help is greatly appreciated

@chingswy
Copy link
Member

chingswy commented May 3, 2024

You can try to remove data/2023-12-18_15.06.36/cache and data/2023-12-18_15.06.36/cache.pkl and retry.

@shankar-anantak
Copy link
Author

After removing cache, here is the output:

python3 apps/train.py --cfg config/example/test/train.yml split train
[Config] merge from parent file: config/example/test/dataset.yml
[Config] merge from parent file: config/example/test/level_of_gaussian.yml
[Config] merge from parent file: config/example/test/stage_8_4.yml
Key is not in the template: split
[Config] replace key $root
[Config] replace key $scale3d
[Config] replace key $PLYNAME
[Config] replace key $PLYNAME
[Config] replace key $PLYNAME
[Config] replace key $xyz_scale
[Config] replace key $scale3d
[Config] replace key $PLYNAME
[Config] replace key $xyz_scale
[Config] replace key $max_steps
[Config] replace key $dataset
[Config] replace key $RGB_RENDER_L1_SSIM
[Config] replace key $NAIVE_STAGE
Using GPUs: 0
Write to output/example/test/log
[/home/dev/dev/LoG/data/2023-12-18_15.06.36/sparse/0/sparse.npz] mean: -0.276, 0.500, -1.018
[/home/dev/dev/LoG/data/2023-12-18_15.06.36/sparse/0/sparse.npz] std: 1.480, 1.085, 3.215
[/home/dev/dev/LoG/data/2023-12-18_15.06.36/sparse/0/sparse.npz] sigma=1 49864/125443
bounds: [[-1.756, -0.584, -4.233], [1.204, 1.585, 2.197]]
[/home/dev/dev/LoG/data/2023-12-18_15.06.36/sparse/0/sparse.npz] sigma=2 114402/125443
bounds: [[-3.236, -1.669, -7.449], [2.683, 2.670, 5.412]]
[/home/dev/dev/LoG/data/2023-12-18_15.06.36/sparse/0/sparse.npz] sigma=3 122545/125443
bounds: [[-4.715, -2.754, -10.664], [4.163, 3.755, 8.627]]
[/home/dev/dev/LoG/data/2023-12-18_15.06.36/sparse/0/sparse.npz] z_min: -53.713, z_max: 23.659
[Load PLY] load from ply: /home/dev/dev/LoG/data/2023-12-18_15.06.36/sparse/0/sparse.npz
[Load PLY] min: [-10.02181557 -10.31714249 -53.71318683], max: [18.95321076 24.50246681 23.65868778]
[Load PLY] scale: 0.0003, 22.6722, mean = 0.0294
[GaussianPoint] scales: [0.0003~0.0294~22.6722]
[GaussianPoint] -> scales: [0.0074~0.0255~0.1176]
>>> Code 3062 files has been copied to output/example/test/log/code_backup_20240503-165112
[ImageDataset] set scales: [1, 2, 4, 8], crop size: [-1, -1]
[ImageDataset] cache dir: /home/dev/dev/LoG/data/2023-12-18_15.06.36/cache
Loaded 677 cameras from /home/dev/dev/LoG/data/2023-12-18_15.06.36/sparse/0
scale3d = 1.0
[ImageDataset] init camera out-1
Not exists: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/out-1.JPG
[ImageDataset] init camera out-10
Not exists: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/out-10.JPG
[ImageDataset] init camera out-100
Not exists: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/out-100.JPG
[ImageDataset] init camera out-101
Not exists: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/out-101.JPG
[ImageDataset] init camera out-102
Not exists: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/out-102.JPG
[ImageDataset] init camera out-103
Not exists: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/out-103.JPG
[ImageDataset] init camera out-104
Not exists: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/out-104.JPG
[ImageDataset] init camera out-105
Not exists: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/out-105.JPG
[ImageDataset] init camera out-106
Not exists: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/out-106.JPG
[ImageDataset] init camera out-107
Not exists: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/out-107.JPG
[ImageDataset] init camera out-108
Not exists: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/out-108.JPG
[ImageDataset] init camera out-109
Not exists: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/out-109.JPG
[ImageDataset] init camera out-11
Not exists: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/out-11.JPG

...
...
...


Not exists: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/out-228.JPG
[ImageDataset] init camera out-229
Not exists: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/out-229.JPG
[ImageDataset] init camera out-23
Not exists: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/out-23.JPG
[ImageDataset] init camera out-230
Not exists: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/out-230.JPG
[ImageDataset] init camera out-231
Not exists: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/out-231.JPG
[ImageDataset] init camera out-232
Not exists: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/out-232.JPG
[ImageDataset] init camera out-233
Not exists: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/out-233.JPG
[ImageDataset] init camera out-234
^CTraceback (most recent call last):
  File "/home/dev/dev/LoG/apps/train.py", line 157, in <module>
    main()
  File "/home/dev/dev/LoG/apps/train.py", line 130, in main
    dataset = load_object(cfg.train.dataset.module, cfg.train.dataset.args)
  File "/home/dev/dev/LoG/LoG/utils/config.py", line 61, in load_object
    obj = getattr(module, name)(**extra_args, **module_args)
  File "/home/dev/dev/LoG/LoG/dataset/colmap.py", line 135, in __init__
    camera = self.check_undis_camera(camname, cameras_cache, camera_dis, share_camera)
  File "/home/dev/dev/LoG/LoG/dataset/colmap.py", line 88, in check_undis_camera
    cameras_cache[cache_camname] = self.init_camera(camera_undis)
  File "/home/dev/dev/LoG/LoG/dataset/colmap.py", line 74, in init_camera
    mapx, mapy = cv2.initUndistortRectifyMap(camera['K'], camera['dist'], None, newK, (width, height), 5)
KeyboardInterrupt

Unsure how camera init is failing here,

[Load PLY] load from ply: /home/dev/dev/LoG/data/2023-12-18_15.06.36/sparse/0/sparse.npz
[Load PLY] min: [-10.02181557 -10.31714249 -53.71318683], max: [18.95321076 24.50246681 23.65868778]
[Load PLY] scale: 0.0003, 22.6722, mean = 0.0294
[GaussianPoint] scales: [0.00030.029422.6722]
[GaussianPoint] -> scales: [0.00740.02550.1176]

Code 3062 files has been copied to output/example/test/log/code_backup_20240503-165112
[ImageDataset] set scales: [1, 2, 4, 8], crop size: [-1, -1]
[ImageDataset] cache dir: /home/dev/dev/LoG/data/2023-12-18_15.06.36/cache
Loaded 677 cameras from /home/dev/dev/LoG/data/2023-12-18_15.06.36/sparse/0

It appears we are able to find the other paths.

I confirmed all images are in here: /home/dev/dev/LoG/data/2023-12-18_15.06.36/images/..

(LoG) dev@instance-20240430-202938:~/dev/LoG$ ls data/2023-12-18_15.06.36/images
out-1.jpg    out-140.jpg  out-182.jpg  out-223.jpg  out-265.jpg  out-306.jpg  out-348.jpg  out-39.jpg   out-430.jpg  out-472.jpg  out-513.jpg  out-555.jpg  out-597.jpg  out-638.jpg  out-68.jpg
out-10.jpg   out-141.jpg  out-183.jpg  out-224.jpg  out-266.jpg  out-307.jpg  out-349.jpg  out-390.jpg  out-431.jpg  out-473.jpg  out-514.jpg  out-556.jpg  out-598.jpg  out-639.jpg  out-680.jpg
out-100.jpg  out-142.jpg  out-184.jpg  out-225.jpg  out-267.jpg  out-308.jpg  out-35.jpg   out-391.jpg  out-432.jpg  out-474.jpg  out-515.jpg  out-557.jpg  out-599.jpg  out-64.jpg   out-681.jpg
out-101.jpg  out-143.jpg  out-185.jpg  out-226.jpg  out-268.jpg  out-309.jpg  out-350.jpg  out-392.jpg  out-433.jpg  out-475.jpg  out-516.jpg  out-558.jpg  out-6.jpg    out-640.jpg  out-69.jpg
out-102.jpg  out-144.jpg  out-186.jpg  out-227.jpg  out-269.jpg  out-31.jpg   out-351.jpg  out-393.jpg  out-434.jpg  out-476.jpg  out-517.jpg  out-559.jpg  out-60.jpg   out-641.jpg  out-7.jpg

Again, your assistance is greatly appreciated!

@chingswy
Copy link
Member

chingswy commented May 4, 2024

Hello, your image extension is .jpg instead of .JPG. You should modify this in dataset.yml

@shankar-anantak
Copy link
Author

Thank you for helping me resolve that silly mistake. Unfortunately, I have a new issue:

write cache to  /home/dev/dev/LoG/data/2023-12-18_15.06.36/cache.pkl
[ImageDataset] offset: [ 0.00241288  0.02408803 -0.06257018], radius: 6.018952590965555
[ImageDataset] init dataset with 677 images
[ImageDataset] set scale 4, crop_size: [-1, -1], downsample_scale: 1
initialize the model: 100%|██████████████████████████████████████████████████████████████████████████████| 677/677 [00:01<00:00, 557.18it/s]
[LoG] minimum scales: [0.0004~0.0027~0.0818]
Traceback (most recent call last):
  File "/home/dev/dev/LoG/apps/train.py", line 157, in <module>
    main()
  File "/home/dev/dev/LoG/apps/train.py", line 139, in main
    trainer.init(dataset)
  File "/home/dev/dev/LoG/LoG/utils/trainer.py", line 177, in init
    self.model.at_init_final()
  File "/home/dev/dev/LoG/LoG/model/level_of_gaussian.py", line 326, in at_init_final
    self.counter.radius3d_max.fill_(self.gaussian.xyz_scale * 0.2)
TypeError: can't multiply sequence by non-int of type 'float'

Again, your assistance is truly appreciated

@chingswy
Copy link
Member

You can check self.gaussian.xyz_scale in this line.

@luoww1992
Copy link

luoww1992 commented May 14, 2024

@chingswy
i use a dataset from internet, this my dataset:
链接:https://pan.baidu.com/s/1PMnA-ibSCqNEmvuCb05cPA
提取码:2ttz
--来自百度网盘超级会员V6的分享
i change the dataset dir and delete args val in train.yaml
and delete the pkl file and cache fold before runing.
then training error:

[ImageDataset] undistort and scale 107 images
100%|████████████████████████████████████████████████████████████████████████████████| 107/107 [00:35<00:00,  2.99it/s]
write cache to  D:\GS_Pro\LoG\data\boli\cache.pkl
[ImageDataset] offset: [3.07092454e-02 2.35425679e-05 2.92656670e-02], radius: 5.4546816443016715
[ImageDataset] init dataset with 107 images
Base iteration: 200
[ImageDataset] set scale 1, crop_size: [-1, -1], downsample_scale: 1
initialize the model: 100%|██████████████████████████████████████████████████████████| 107/107 [00:17<00:00,  6.20it/s]
[LoG] minimum scales: [0.0002~0.0006~0.0044]
[Corrector] init view correction: 107
[ImageDataset] set partial indices 107
quick view:  10%|███████                                                              | 11/107 [00:06<00:59,  1.63it/s]
[ImageDataset] set partial indices 107
> Run stage: init. 30000 iterations
[ImageDataset] set scale 8, crop_size: [-1, -1], downsample_scale: 1
[SparseOptimizer] xyz_scale: 1.0, steps: 150000, lr 0.00016->1.6e-06
[SparseOptimizer] scaling: 0.005 -> 0.005
[LoG] optimizer setup: max steps = 150000
[Counter] reset counter -> 96007
[Corrector] view correction optimizer setup 0.001
Traceback (most recent call last):
  File "D:\GS_Pro\LoG\apps\train.py", line 169, in <module>
    main()
  File "D:\GS_Pro\LoG\apps\train.py", line 166, in main
    trainer.fit(dataset)
  File "d:\gs_pro\log\LoG\utils\trainer.py", line 487, in fit
    for iteration, data in enumerate(trainloader):
  File "G:\miniconda3\envs\log\lib\site-packages\torch\utils\data\dataloader.py", line 439, in __iter__
    return self._get_iterator()
  File "G:\miniconda3\envs\log\lib\site-packages\torch\utils\data\dataloader.py", line 387, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "G:\miniconda3\envs\log\lib\site-packages\torch\utils\data\dataloader.py", line 1040, in __init__
    w.start()
  File "G:\miniconda3\envs\log\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "G:\miniconda3\envs\log\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "G:\miniconda3\envs\log\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "G:\miniconda3\envs\log\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "G:\miniconda3\envs\log\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'Trainer.train_loader.<locals>.worker_init_fn'

(log) D:\GS_Pro\LoG>Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "G:\miniconda3\envs\log\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "G:\miniconda3\envs\log\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

i looks like the dataset loads wrong.

@chingswy
Copy link
Member

Hello, I am able to load your data and train normally on my computer, but I only have a Linux testing environment. Your issue might be related to Windows or a specific problem with PyTorch.

@jorenmichels
Copy link

@chingswy i use a dataset from internet, this my dataset: 链接:https://pan.baidu.com/s/1PMnA-ibSCqNEmvuCb05cPA 提取码:2ttz --来自百度网盘超级会员V6的分享 i change the dataset dir and delete args val in train.yaml and delete the pkl file and cache fold before runing. then training error:

[ImageDataset] undistort and scale 107 images
100%|████████████████████████████████████████████████████████████████████████████████| 107/107 [00:35<00:00,  2.99it/s]
write cache to  D:\GS_Pro\LoG\data\boli\cache.pkl
[ImageDataset] offset: [3.07092454e-02 2.35425679e-05 2.92656670e-02], radius: 5.4546816443016715
[ImageDataset] init dataset with 107 images
Base iteration: 200
[ImageDataset] set scale 1, crop_size: [-1, -1], downsample_scale: 1
initialize the model: 100%|██████████████████████████████████████████████████████████| 107/107 [00:17<00:00,  6.20it/s]
[LoG] minimum scales: [0.0002~0.0006~0.0044]
[Corrector] init view correction: 107
[ImageDataset] set partial indices 107
quick view:  10%|███████                                                              | 11/107 [00:06<00:59,  1.63it/s]
[ImageDataset] set partial indices 107
> Run stage: init. 30000 iterations
[ImageDataset] set scale 8, crop_size: [-1, -1], downsample_scale: 1
[SparseOptimizer] xyz_scale: 1.0, steps: 150000, lr 0.00016->1.6e-06
[SparseOptimizer] scaling: 0.005 -> 0.005
[LoG] optimizer setup: max steps = 150000
[Counter] reset counter -> 96007
[Corrector] view correction optimizer setup 0.001
Traceback (most recent call last):
  File "D:\GS_Pro\LoG\apps\train.py", line 169, in <module>
    main()
  File "D:\GS_Pro\LoG\apps\train.py", line 166, in main
    trainer.fit(dataset)
  File "d:\gs_pro\log\LoG\utils\trainer.py", line 487, in fit
    for iteration, data in enumerate(trainloader):
  File "G:\miniconda3\envs\log\lib\site-packages\torch\utils\data\dataloader.py", line 439, in __iter__
    return self._get_iterator()
  File "G:\miniconda3\envs\log\lib\site-packages\torch\utils\data\dataloader.py", line 387, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "G:\miniconda3\envs\log\lib\site-packages\torch\utils\data\dataloader.py", line 1040, in __init__
    w.start()
  File "G:\miniconda3\envs\log\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "G:\miniconda3\envs\log\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "G:\miniconda3\envs\log\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "G:\miniconda3\envs\log\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "G:\miniconda3\envs\log\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'Trainer.train_loader.<locals>.worker_init_fn'

(log) D:\GS_Pro\LoG>Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "G:\miniconda3\envs\log\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "G:\miniconda3\envs\log\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

i looks like the dataset loads wrong.

There are a lot of places in the code where they specify the amount of cpu workers (num_workers). If you set all of them to 1, it should work on windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants