Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix coca_cfg #691

Closed
wants to merge 1 commit into from
Closed

Conversation

gpucce
Copy link
Contributor

@gpucce gpucce commented Oct 22, 2023

No description provided.

@gpucce gpucce marked this pull request as draft October 22, 2023 22:45
@rwightman
Copy link
Collaborator

@gpucce much clearer, haha, thanks.

So, I'm a bit confused by this. I get that class token is being added in the model, but why would you want to pass a seq of 77 tokens then instead of 76?

context_length == the length of the sequence passed in the text input. So in essence if you were passing 77 and then adding a class token it was ending up as 78 no?

@gpucce
Copy link
Contributor Author

gpucce commented Oct 22, 2023

@rwightman

@gpucce much clearer, haha, thanks.

So, I'm a bit confused by this. I get that class token is being added in the model, but why would you want to pass a seq of 77 tokens then instead of 76?

context_length == the length of the sequence passed in the text input. So in essence if you were passing 77 and then adding a class token it was ending up as 78 no?

Honestly I am not 100% sure why I did it. I think initially it would error with more than 77 tokens, I think the issue was a sort of inconsistency between the text encoder and the multimodal decoder that would error if context length was longer than 77. It could be that it now works now I am rushing a bit because it is very late and I think this would keep it same as it was until now but with the right tokenizer in the get_tokenizer as you said.

Relatedly I don't think I can continue with the regression tests now and in the next days I will be very busy, can I ask you how urgent it would be to have it finished?

@rwightman
Copy link
Collaborator

@gpucce generation seems to be working okay, there might be an off by one issue related to this but if it's erring on the lower side (passing 76) tokens it shoudn't be harmful no? I can poke around a bit more and see if there is any reason to hold off a release, but either way, get some sleep, thanks for spending the time that you have.

@gpucce
Copy link
Contributor Author

gpucce commented Oct 22, 2023

@gpucce generation seems to be working okay, there might be an off by one issue related to this but if it's erring on the lower side (passing 76) tokens it shoudn't be harmful no? I can poke around a bit more and see if there is any reason to hold off a release, but either way, get some sleep, thanks for spending the time that you have.

I think in fact the model itself is totally fine, the logits from the forward are the same as they used to be and they have in several older version, I have been trying the whole a bit today, so it should be fine, the issue I had was this off by one, but I am now very confident it is only in the tokenizer not in the model.

@rwightman
Copy link
Collaborator

@gpucce yeah, so the tokenizer behaviour did change, but think the issue is a bit of muddled logic wrt to seq len / context len handling in the model. context_length attribute should reflect the length of the sequence that can be passed into the model inputs. You can pass a 77 length input to this model even though context length was 76. Tokenizers are doing what they should now. But before we always fed 77 tokens into the model even though the context length didn match that and model logic was adapted to work around that...

@gpucce
Copy link
Contributor Author

gpucce commented Oct 22, 2023

@gpucce yeah, so the tokenizer behaviour did change, but think the issue is a bit of muddled logic wrt to seq len / context len handling in the model. context_length attribute should reflect the length of the sequence that can be passed into the model inputs. You can pass a 77 length input to this model even though context length was 76. Tokenizers are doing what they should now. But before we always fed 77 tokens into the model even though the context length didn match that and model logic was adapted to work around that...

@rwightman I agree on all of it, my only comment was I couldn't find any other issue with generation so hopefully there aren't

@rwightman
Copy link
Collaborator

@gpucce k, thanks, yeah it seems like the generation is probably okay, so I'll investigate this issue a bit more to ensure that it doesn't cause any significant issues

@rwightman
Copy link
Collaborator

rwightman commented Oct 23, 2023

k, so strikes me that the embed_cls bool arg should be removed altogether? it's always chopping the last token on the input when True (which seems like it could cause other problems, ie by removing the eot token in some cases). It looks like it was added to workaround the tokenizers having const 77 token output? With tokenizers obeying the context_length attr in the model, seems we should leave that as 76 and we can safely add the cls embed without going over.


    def _encode_text(self, text, normalize: bool = True, embed_cls: bool = True):
        text = text[:, :-1] if embed_cls else text # make space for CLS token
        text_latent, token_emb = self.text(text)
        text_latent = F.normalize(text_latent, dim=-1) if normalize else text_latent
        return text_latent, token_emb

    def encode_image(self, images, normalize: bool = True):
        image_latent, _ = self._encode_image(images, normalize=normalize)
        return image_latent

    def encode_text(self, text, normalize: bool = True, embed_cls: bool = True):
        text_latent, _ = self._encode_text(text, normalize=normalize, embed_cls=embed_cls)
        return text_latent

@gpucce
Copy link
Contributor Author

gpucce commented Oct 23, 2023

#692 addresses this in a better way

@gpucce gpucce closed this Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants