Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce without pre-trained ResNet-50 weights of xiao2018simple #16

Open
mimiliaogo opened this issue Sep 7, 2022 · 9 comments

Comments

@mimiliaogo
Copy link

Hi,
I tried to reproduce table 8 without pre-trained ResNet-50 weights of xiao2018simple.
My training command is python train.py --amp --gpu 0 --cfg ../assets/yaml/3dpw_crowd.yml
and the config file is :

trainset_3d: ['Human36M', 'MuCo']
trainset_2d: ['MSCOCO', 'MPII']
testset: 'PW3D'

lr_dec_epoch: [30]
end_epoch: 40
lr: 0.00025 #0.001/4
lr_backbone: 0.0001
lr_dec_factor: 10

However, I got very strange results on 3dpw as below (I evaluate every epoch):
image

Do you have any idea about this?
Thank you!

@hongsukchoi
Copy link
Owner

Hi,

if you are not using the pretrained backbone, please set ‘lr’ and ‘lr_backbone’ the same

@mimiliaogo
Copy link
Author

I changed my config file as below:

trainset_3d: ['Human36M', 'MuCo']
trainset_2d: ['MSCOCO', 'MPII']
testset: 'PW3D'

lr_dec_epoch: [30]
end_epoch: 40
lr: 0.0005 
lr_backbone: 0.0005
lr_dec_factor: 10

# modify batch size
train_batch_size: 128
test_batch_size: 128

However, the results were still weird.
image

@hongsukchoi
Copy link
Owner

Hi,

Yes, the results seem weird.

  1. Are you evaluating on 3DPW-Crowd?

  2. How can you train that fast? I don’t remember exactly, but it took about more than 12hours to train for 6epochs. You are training for 40epochs with half batch size. 2days are not enough.

@mimiliaogo
Copy link
Author

  1. I evaluate on 3DPW. not Crowd.
  2. I used RTX3090 with batch size 128. The training time is 0.91h / epoch.

@hongsukchoi
Copy link
Owner

Wow, I didn't know that RTX 3090 is that better than RTX 2080 ti.

I thought you were testing on 3DPW-Crowd, since you are using 3dpw_crowd.yml

My training command is python train.py --amp --gpu 0 --cfg ../assets/yaml/3dpw_crowd.yml

Can you share your full code via github repo? Some information is confusing. Increasing errors seem really weird.

@mimiliaogo
Copy link
Author

So sorry that I pasted the wrong command.
My training command is python train.py --amp --gpu 0 --cfg ../assets/yaml/3dpw.yml
This is my full code: https://github.com/mimiliaogo/3DCrowdNet-Mimi
Thank you so much!

@hongsukchoi
Copy link
Owner

hongsukchoi commented Sep 10, 2022

Thanks for sharing the code.
I can't find a critical bug...

Here are a few suggestions.

  1. Could you try testing with the test.py? Due to the evaluation per epoch, there could be unintentional overwriting in the testing data during the process.

  2. Could you visualize the training data? Visualize GT joints and meshes on the image. There could be corruption during downloading. And is there any change in MPII.py code?

  3. Could you train with this config info and see the result? It shouldn't take long. It's to see which dataset is causing the increasing error.

trainset_3d: []
trainset_2d: ['MSCOCO']
testset: 'PW3D'

lr_dec_epoch: [30]
end_epoch: 40
lr: 0.001
lr_backbone: 0.001
lr_dec_factor: 10

# modify batch size
train_batch_size: 128
test_batch_size: 128

@mimiliaogo
Copy link
Author

Hi,
I tried your conifg as 3., the results seem normal.
image
So maybe the problem is from training data. I will try to visualize them.
BTW, there is no change in MPII.py code.

@mimiliaogo
Copy link
Author

@hongsukchoi, when I train your model with Human3.6M and MuCo respectively, both of them will have increasing errors.
I visualize the GT keypoints and joints, and the results seem normal (maybe a little inaccurate, but mostly right).
However, I still don't know why these two datasets will lead to increasing errors...
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants