You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, for classification, in the initial paper of 2016, 1x1 convolution was used, but in the 2018 paper (Figure2), 3x3 convolution was used.
Are there any spacial reasons for changing numbers? In addition, at the top of page 4, it is more confusing to say that 'we first apply a 1x1 convolution with 30 channels'. I wonder what number is correct.
Second, I want to see why the concated fetures in the Detection Decoder in Figure2 of the 2018 paper are expressed as 39x12x1526.
According to my calculations, ROI Aligh 128 channels is concatenated with 128*8=1024.(+ I wonder why I see 8 instead of 9, except for the existing results in the middle), 500 channels in the Bottleneck block, and finally Prediction 6 channels are concatenated, so the final result is supposed to be 1024+500+6=1530. I will be very grateful if you let me know if I have the wrong part. I have been thinking about this number for a long time, but there is no other conclusion.
I look forward to your reply. Thank you.
The text was updated successfully, but these errors were encountered:
Great Job! But I still have a question. I can't understand the number, too. According to the paper, features transform from (156×48×128) to (39×12×1020) using ROI Align. And I feel confused about this step. If you could expand on it, I would appreciate it. Thanks a lot in advance.
When I read a thesis, I do not understand it.
First, for classification, in the initial paper of 2016, 1x1 convolution was used, but in the 2018 paper (Figure2), 3x3 convolution was used.
Are there any spacial reasons for changing numbers? In addition, at the top of page 4, it is more confusing to say that 'we first apply a 1x1 convolution with 30 channels'. I wonder what number is correct.
Second, I want to see why the concated fetures in the Detection Decoder in Figure2 of the 2018 paper are expressed as 39x12x1526.
According to my calculations, ROI Aligh 128 channels is concatenated with 128*8=1024.(+ I wonder why I see 8 instead of 9, except for the existing results in the middle), 500 channels in the Bottleneck block, and finally Prediction 6 channels are concatenated, so the final result is supposed to be 1024+500+6=1530. I will be very grateful if you let me know if I have the wrong part. I have been thinking about this number for a long time, but there is no other conclusion.
I look forward to your reply. Thank you.
The text was updated successfully, but these errors were encountered: