-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sentence piece tokenizer support for TokenizerInfo #120
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Could you leave a benchmark result in comments of this pr:
- The cpu you are using
- The model used (three kinds: hf, tiktoken, sp)
- TokenizerInfo Build time
We can merge it after addressing these issues.
CPU: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Build-time tests: "microsoft/Phi-3.5-mini-instruct" (hf) "Qwen/Qwen-7B-Chat" (tiktoken) "THUDM/glm-4-9b-chat" (tiktoken) "THUDM/chatglm3-6b" (sp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks @zanderjiang!
No description provided.