This is just a wrapper around GPT3Tokenizer using the HuggingFace RoBERTa vocab and merge files.
See GPT3 documentation for example use (or the generated test case under tests/
).
To use the multilingual version, the SentencePiece dependency needs to be initialized and an aditional model file needs to be downloaded:
composer exec -- php -r "require 'vendor/autoload.php'; Textualization\SentencePiece\Vendor::check();"
composer exec -- php -r "require 'vendor/autoload.php'; Textualization\Ropherta\Tokenizer\Vendor::check();"
We thank our sponsor: