Attention head implementation is different from the paper #13

Pointy-Hat · 2024-01-08T16:45:53Z

I have gone through the code for the attention head, and it seem to me that it is wildly different from what is described in the paper. It starts with 3x3x1024 convolutions that take up over 50% of all model's parameters. The whole thing is bizzare, and includes even 1x1x1024 convolutions at the end of both sub-heads. Also the residual connection from the branch outputs is missing.

An illustration that shows the difference:

Maybe this is the reason for the non-reproducible results? I don't think it is the case, but I would be curious to find out.

dddb11 · 2024-01-09T06:06:14Z

I have gone through the code for the attention head, and it seem to me that it is wildly different from what is described in the paper. It starts with 3x3x1024 convolutions that take up over 50% of all model's parameters. The whole thing is bizzare, and includes even 1x1x1024 convolutions at the end of both sub-heads. Also the residual connection from the branch outputs is missing.

An illustration that shows the difference:

Maybe this is the reason for the non-reproducible results? I don't think it is the case, but I would be curious to find out.

The code of model comes from the official codebase(https://github.com/dong03/MVSS-Net) so I did not notice it before. But I check the code and the first layer and the residual connection in DAHead are indeed different from which are mentioned in the paper.

Pointy-Hat · 2024-01-09T09:40:30Z

I noticed that too in the original repo, but since the original developers don't respond, I instead chose to share this information with you.

dddb11 · 2024-01-09T10:04:12Z

I noticed that too in the original repo, but since the original developers don't respond, I instead chose to share this information with you.

I did not notice the DAHead occupied such a large portion of the parameters in the model. And it is interesting that a vanilla FCN with DAHead is able to achieve good performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attention head implementation is different from the paper #13

Attention head implementation is different from the paper #13

Pointy-Hat commented Jan 8, 2024

dddb11 commented Jan 9, 2024 •

edited

Loading

Pointy-Hat commented Jan 9, 2024

dddb11 commented Jan 9, 2024

Attention head implementation is different from the paper #13

Attention head implementation is different from the paper #13

Comments

Pointy-Hat commented Jan 8, 2024

dddb11 commented Jan 9, 2024 • edited Loading

Pointy-Hat commented Jan 9, 2024

dddb11 commented Jan 9, 2024

dddb11 commented Jan 9, 2024 •

edited

Loading