Replies: 2 comments 2 replies
-
TokenMonster is not language dependent. It will work with any language that uses a space as a word boundary (non-standard spaces are also fine). It'll also work with languages that don't use space as word boundaries, but you'll need to use |
Beta Was this translation helpful? Give feedback.
0 replies
-
I trained a model for English-Russian, it can be downloaded from IPFS: https://ipfs.io/ipfs/QmPhxHrNyogBnzxY5onAnkvvgg78RP26R1XXrv3Ka6Qc9J?filename=russian.vocab |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Do you plan to support more languages?
Also it seems some of the optimizations where made with the western languages in mind. Do you plan to explore how this tokenizer works with non western languages?
Beta Was this translation helpful? Give feedback.
All reactions