Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ViT-SO400M-14-SigLIP context length #1001

Closed
wants to merge 1 commit into from

Conversation

shkarupa-alex
Copy link

According to paper all SigLIP models have context length = 64

According to paper all SigLIP models have context length = 64
@rwightman
Copy link
Collaborator

Uploading Screenshot 2024-11-29 at 12.22.31 PM.png…

@rwightman
Copy link
Collaborator

Hmm screenshot doesn't seem to work, but @shkarupa-alex this notebook is taken by me to be the source of truth on these models and it says 16 https://github.com/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/SigLIP_demo.ipynb

@shkarupa-alex
Copy link
Author

Yes, but ...
TF ViT uses valid padding in stem https://github.com/google-research/big_vision/blob/main/big_vision/models/vit.py#L212
This means that real image size it see is 384 // 14 = floor(27.4) * 14 = 27 * 14 = 378

To have the same logits as in pretraining size 384 should be kept (this error may be seen as crop augmentation). But in inference we want to see the whole image without "cropping".

@rwightman
Copy link
Collaborator

@shkarupa-alex that's a completely separate issue, and that's why there's both a 378x378 and 384x384 version of those model configs with the same weights, there was a mistake in the original because 378 should have been used w/ 14x14 patch size but they chose 384. using 378 in processing yields better results because passing a 384 image will be truncated to 378, but I left a 384 config so it matches the official impl.

@rwightman rwightman closed this Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants