Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attention head implementation is different from the paper #13

Open
Pointy-Hat opened this issue Jan 8, 2024 · 3 comments
Open

Attention head implementation is different from the paper #13

Pointy-Hat opened this issue Jan 8, 2024 · 3 comments

Comments

@Pointy-Hat
Copy link

I have gone through the code for the attention head, and it seem to me that it is wildly different from what is described in the paper. It starts with 3x3x1024 convolutions that take up over 50% of all model's parameters. The whole thing is bizzare, and includes even 1x1x1024 convolutions at the end of both sub-heads. Also the residual connection from the branch outputs is missing.

An illustration that shows the difference:
Screenshot 2024-01-08 at 16 32 59

Maybe this is the reason for the non-reproducible results? I don't think it is the case, but I would be curious to find out.

@dddb11
Copy link
Owner

dddb11 commented Jan 9, 2024

I have gone through the code for the attention head, and it seem to me that it is wildly different from what is described in the paper. It starts with 3x3x1024 convolutions that take up over 50% of all model's parameters. The whole thing is bizzare, and includes even 1x1x1024 convolutions at the end of both sub-heads. Also the residual connection from the branch outputs is missing.

An illustration that shows the difference: Screenshot 2024-01-08 at 16 32 59

Maybe this is the reason for the non-reproducible results? I don't think it is the case, but I would be curious to find out.

The code of model comes from the official codebase(https://github.com/dong03/MVSS-Net) so I did not notice it before. But I check the code and the first layer and the residual connection in DAHead are indeed different from which are mentioned in the paper.

@Pointy-Hat
Copy link
Author

I noticed that too in the original repo, but since the original developers don't respond, I instead chose to share this information with you.

@dddb11
Copy link
Owner

dddb11 commented Jan 9, 2024

I noticed that too in the original repo, but since the original developers don't respond, I instead chose to share this information with you.

I did not notice the DAHead occupied such a large portion of the parameters in the model. And it is interesting that a vanilla FCN with DAHead is able to achieve good performance.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants