Question on the vocabulary size #26

PPPPPsanG · 2024-10-15T02:17:34Z

Emu3 is a good work, but i have some question on it.
The vocabulary size of Qwen is 152064 , while the codebook size of vision tokenizer employed in Emu3 is 32768
The addation is 184832, the vocabulary size reported in Emu3 is 184622.
Why do the numbers not match?

ryanzhangfan · 2024-10-25T03:16:23Z

We use the vacab.json in Qwen2 which have 151643 tokens, plus 32768 vision tokens, 205 extra tokens and 6 special tokens, making the total vocabulary size of 184622.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on the vocabulary size #26

Question on the vocabulary size #26

PPPPPsanG commented Oct 15, 2024

ryanzhangfan commented Oct 25, 2024

Question on the vocabulary size #26

Question on the vocabulary size #26

Comments

PPPPPsanG commented Oct 15, 2024

ryanzhangfan commented Oct 25, 2024