-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Issues: google/sentencepiece
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
With unigram algorithm, constant piece at end of each sentences does not become a token
bug
#1047
opened Aug 29, 2024 by
jogardi
Error Attribute Error: type object 'SentencePieceTrainer' has no attribute 'train'. Did you mean: 'Train'?
#1046
opened Aug 23, 2024 by
bop578530
When I set SPM_PROTOBUF_PROVIDER to "package" in CMakeLists.txt, the compilation fails.
#1029
opened Jun 25, 2024 by
hhxdestiny
High frequency token segmented into letter sequence when input is a tsv file
bug
#967
opened Jan 30, 2024 by
TingxunShi
A recent EMNLP work to share about task-adaptive tokenization with variable segmentation
#924
opened Oct 24, 2023 by
lsy641
Unexpected behavior with sampling of repeated character sequence.
#904
opened Aug 14, 2023 by
kellymarchisio
Python from source on armv7l raises ' undefined symbol: __atomic_fetch_add_8 '
#865
opened May 17, 2023 by
FrancescoScandiffio
tokens listed in user_defined_symbols tokenized as unknowns when using the "word" model_type
bug
#801
opened Dec 15, 2022 by
lintangsutawika
Previous Next
ProTip!
Exclude everything labeled
bug
with -label:bug.