Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable dynamic resolution input for Beit #31053

Merged
merged 5 commits into from
Jun 6, 2024

Conversation

OmarManzoor
Copy link
Contributor

@OmarManzoor OmarManzoor commented May 27, 2024

What does this PR do?

Towards #30579

Before submitting

Who can review?

CC: @amyeroberts
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@OmarManzoor
Copy link
Contributor Author

OmarManzoor commented May 27, 2024

Hi @amyeroberts

This PR is incomplete right now because I am unsure how to proceed. It seems that the BeitEncoder takes the original patch embeddings configuration as an input and hence the original window size
https://github.com/huggingface/transformers/pull/31053/files#diff-3f84bebd6be8d9c0f5c5068199f5c49eac8489d5fa466fb6fa08b0365e78dba4R679-R685
This is used to initialize the BeitRelativePositionBias and the dimensions mismatch when we interpolate the embeddings during the forward calls. Could you kindly guide regarding what I am missing?

@OmarManzoor OmarManzoor changed the title Initial attempt Enable dynamic resolution input for Beit May 28, 2024
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding! Just a few small comments

tests/models/beit/test_modeling_beit.py Outdated Show resolved Hide resolved
Comment on lines +561 to +562
with self.assertRaises(ValueError, msg="doesn't match model"):
model(pixel_values, interpolate_pos_encoding=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure this still holds if anything happens upstream and to make things explicit, could you add the following above:

self.assertFalse(processor.do_center_crop)

src/transformers/models/beit/modeling_beit.py Outdated Show resolved Hide resolved
src/transformers/models/beit/modeling_beit.py Outdated Show resolved Hide resolved
src/transformers/models/beit/modeling_beit.py Show resolved Hide resolved
@amyeroberts
Copy link
Collaborator

@OmarManzoor Regarding the relative position bias, looking at the modeling code, I think whenever the output of this model is used, then it will also need to be interpolated if interpolate_pos_encoding=True

@OmarManzoor
Copy link
Contributor Author

OmarManzoor commented Jun 3, 2024

@OmarManzoor Regarding the relative position bias, looking at the modeling code, I think whenever the output of this model is used, then it will also need to be interpolated if interpolate_pos_encoding=True

Could you kindly clarify a bit where exactly this should be added? Do we need to add a new interpolation function that works for BeitSelfAttention?

@amyeroberts
Copy link
Collaborator

@OmarManzoor You need to make sure that the relative position biases are interpolated wherever that is needed. This might be as an argument to the relative position class, or within the modules that use its output

@OmarManzoor
Copy link
Contributor Author

@amyeroberts How do I calculate the interpolations for the relative position biases similar to how we calculated them for the embeddings?

@OmarManzoor OmarManzoor marked this pull request as ready for review June 4, 2024 08:48
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great - thanks for adding this!

Only a small nit on the docstring for data2vec

@@ -670,6 +753,7 @@ def forward(
head_mask: Optional[torch.Tensor] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
interpolate_pos_encoding: bool = False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be added to DATA2VEC_VISION_INPUTS_DOCSTRING

@amyeroberts
Copy link
Collaborator

Thanks again for adding this!

@amyeroberts amyeroberts merged commit 6811839 into huggingface:main Jun 6, 2024
18 checks passed
@OmarManzoor OmarManzoor deleted the dynamic_resolution_beit branch June 7, 2024 10:02
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Jun 11, 2024
* Initial attempt

* Updates: PR suggestions

* Interpolate the relative position bias when interpolate_pos_encoding is True

* Add slow tag for the added tests

* Add in DATA2VEC_VISION_INPUTS_DOCSTRING
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Jun 14, 2024
* Initial attempt

* Updates: PR suggestions

* Interpolate the relative position bias when interpolate_pos_encoding is True

* Add slow tag for the added tests

* Add in DATA2VEC_VISION_INPUTS_DOCSTRING
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants