-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix coca_cfg #691
fix coca_cfg #691
Conversation
@gpucce much clearer, haha, thanks. So, I'm a bit confused by this. I get that class token is being added in the model, but why would you want to pass a seq of 77 tokens then instead of 76? context_length == the length of the sequence passed in the text input. So in essence if you were passing 77 and then adding a class token it was ending up as 78 no? |
Honestly I am not 100% sure why I did it. I think initially it would error with more than 77 tokens, I think the issue was a sort of inconsistency between the text encoder and the multimodal decoder that would error if context length was longer than 77. It could be that it now works now I am rushing a bit because it is very late and I think this would keep it same as it was until now but with the right tokenizer in the Relatedly I don't think I can continue with the regression tests now and in the next days I will be very busy, can I ask you how urgent it would be to have it finished? |
@gpucce generation seems to be working okay, there might be an off by one issue related to this but if it's erring on the lower side (passing 76) tokens it shoudn't be harmful no? I can poke around a bit more and see if there is any reason to hold off a release, but either way, get some sleep, thanks for spending the time that you have. |
I think in fact the model itself is totally fine, the logits from the forward are the same as they used to be and they have in several older version, I have been trying the whole a bit today, so it should be fine, the issue I had was this off by one, but I am now very confident it is only in the tokenizer not in the model. |
@gpucce yeah, so the tokenizer behaviour did change, but think the issue is a bit of muddled logic wrt to seq len / context len handling in the model. context_length attribute should reflect the length of the sequence that can be passed into the model inputs. You can pass a 77 length input to this model even though context length was 76. Tokenizers are doing what they should now. But before we always fed 77 tokens into the model even though the context length didn match that and model logic was adapted to work around that... |
@rwightman I agree on all of it, my only comment was I couldn't find any other issue with generation so hopefully there aren't |
@gpucce k, thanks, yeah it seems like the generation is probably okay, so I'll investigate this issue a bit more to ensure that it doesn't cause any significant issues |
k, so strikes me that the embed_cls bool arg should be removed altogether? it's always chopping the last token on the input when True (which seems like it could cause other problems, ie by removing the eot token in some cases). It looks like it was added to workaround the tokenizers having const 77 token output? With tokenizers obeying the context_length attr in the model, seems we should leave that as 76 and we can safely add the cls embed without going over.
|
#692 addresses this in a better way |
No description provided.