-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does ScatterText somehow combine tokens? #132
Comments
Could you please provide a runnable example to show this? It's possible the tokenizer is merging those two words into a single token, or Scattertext ended up aligning them in the labeling phase. |
But please submit a reproducible example of where this occurs. Otherwise, there's nothing I can do to look into this. |
Just wanted to first know if there is a possibility for something like that happening which you are aware of. The small background is that I'm very familiar with the tokenizer I'm using ( Here is the data I'm using: Here is the one-liner to read it to ensure consistency with the way I have it:
Here is the wrapper for the tokenizer I'm using, inspired by the the
|
I realize you have a lot of experience with this token, but have you programmatically checked the tokenizer's output on this file to verify that the token in question isn't there? |
Yes |
I have many cases where two tokens such as བྱང་ཆུབ་ and སེམས་དཔ become a single thing in the scatterplot. Is this something that ScatterText is doing? The tokenizer I'm using does not do that.
The text was updated successfully, but these errors were encountered: