You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey!
Thanks for this great library, this helped us to avoid installing the whole transformers library to be able to use the tokenizer!
I want to ask how can I map the tokens I get from huggingface DistilBertTokenizer to the positions of the input text? e.g. I have a new GPU -> ["i", "have", "a", "new", "gp", "##u"] -> [(0, 1), (2, 6), ...]
I'm interested in this because suppose that I have some attention values assigned to each token, I would like to show which part of the original text it actually corresponds to, since the tokenized version is not non-ML people friendly.
I have not found solution to this. The library only supports Encode and Decode method. Any insights would be appreciated. Thank you!
The text was updated successfully, but these errors were encountered:
Hey!
Thanks for this great library, this helped us to avoid installing the whole transformers library to be able to use the tokenizer!
I want to ask how can I map the tokens I get from huggingface DistilBertTokenizer to the positions of the input text?
e.g. I have a new GPU -> ["i", "have", "a", "new", "gp", "##u"] -> [(0, 1), (2, 6), ...]
I'm interested in this because suppose that I have some attention values assigned to each token, I would like to show which part of the original text it actually corresponds to, since the tokenized version is not non-ML people friendly.
I have not found solution to this. The library only supports
Encode
andDecode
method. Any insights would be appreciated. Thank you!The text was updated successfully, but these errors were encountered: