online hard-example mining/examining under Multi-GPU ='dp' #1170
-
Background Hi, I try to track the prediction of each individual sample during training/validation-step. The main purpose is to do online hard-example mining/examining. I found out a way of doing this is to make the input variable of the functions training/validation_step carrying the sample-id information, for example, the file-name. So I made the input to be a dictionary. Example Code
Input-Dict works in Single GPU but fail under multi-GPUs-dp
AND It takes me some time to realize that all value-objects inside the input-dictionary should be torch.Tensor, not list contains strings, otherwise while training under Multi-GPU ='dp' mode, the list-obj won't be separated properly. Input-Dict works in both Single/multi-GPUs-dp
Currently, I still have some doubts on this approach... |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
@neggert @jeffling @jeremyjordan pls ^^ |
Beta Was this translation helpful? Give feedback.
-
have you considered using a library such as in general, it would look something like
this does mining within each batch that you pass in. i'm not sure where you're doing the mining currently but it seems suspicious to be appending data to a class attribute ( |
Beta Was this translation helpful? Give feedback.
-
Thanks for reply! My original purpose is to pick-out and record the hard-samples during the training/validation after every epoch. Therefore I append the result into the lightning-model-instance. Thanks for pointing out that it would be a failure design on multi-gpus with ddp mode. I didn't know pytorch-metric-learning before. It seems to be one of right libraries that I should look at. Really appreciate! |
Beta Was this translation helpful? Give feedback.
have you considered using a library such as
pytorch-metric-learning
?in general, it would look something like
this does mining within each batch that you pass in. i'm not sure…