How does Detection/Decoder module work? #46

jmlb · 2018-04-24T16:13:38Z

Hi,
I am totally confused about the working of the Detection Decoder

In the output of the module what are the 2 labels/classes?
According to the paper, the 1st and 2nd channel of the prediction output gives the confidence that an object of interest is present at a particular location.
What are the objects of interest: car/road?
Fig.3 shows 3 crossed gray cells: are those the cells in 'I don't care area'.
is it expected that the top of the image (the sky) is not considered as "I don't care area".

The last 4 channels are the bounding box coordinate ( x0, y0, h, w).

are those coordinates at the scale of the input image dimension, or at the scale of the (39x12) feature map?
What is "delta prediction" (the residue)? Is it the correction to be applied to the coarse estimate of the bounding box (from the prediction)
what's the difference between the output of the Segmentation decoder and the Detection Decoder in terms of output: I understand that the Segmentation outputs a mask related to the 2 classes. But I would thought that the Detection Decoder output the coordinate of the bounding boxes.

Thank you

Provide feedback