You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BE kem pertama dalam Bahasa Melayu. 350 Pax Pemimpin daripada Malaysia, Singapura, Brunei Dan Indonesia!!! Marilah kita membawa gelombang #BEInternational ke Pasaran Melayu!!!🔥🔥🔥🔥🔥
with the text above, i got a result:
[['BE', 'kem', 'pertama', 'dalam', 'Bahasa', 'Melayu', '.'], ['350', 'Pax', 'Pemimpin', 'daripada', 'Malaysia', ',', 'Singapura', ',', 'Brunei', 'Dan', 'Indonesia', '!'], ['!'], ['!'], ['Marilah', 'kita', 'membawa', 'gelombang', '#', 'BEInternational', 'ke', 'Pasaran', 'Melayu', '!'], ['!'], ['!'], ['🔥', '🔥', '🔥', '🔥', '🔥']]
I want get the "!!!" as total .
Thanks!
The text was updated successfully, but these errors were encountered:
How did you run the tokenizer? I tested it like this ("en_PTB" and "de_CMC" give the same results on this input – I don't know which one would be more appropriate for Malaysian):
fromsomajoimportSoMaJotokenizer=SoMaJo("en_PTB")
paragraphs= ["BE kem pertama dalam Bahasa Melayu. 350 Pax Pemimpin daripada Malaysia, Singapura, Brunei Dan Indonesia!!! Marilah kita membawa gelombang #BEInternational ke Pasaran Melayu!!!🔥🔥🔥🔥🔥"]
sentences=tokenizer.tokenize_text(paragraphs)
print([[token.textfortokenins] forsinsentences])
BE kem pertama dalam Bahasa Melayu. 350 Pax Pemimpin daripada Malaysia, Singapura, Brunei Dan Indonesia!!! Marilah kita membawa gelombang #BEInternational ke Pasaran Melayu!!!🔥🔥🔥🔥🔥
with the text above, i got a result:
[['BE', 'kem', 'pertama', 'dalam', 'Bahasa', 'Melayu', '.'], ['350', 'Pax', 'Pemimpin', 'daripada', 'Malaysia', ',', 'Singapura', ',', 'Brunei', 'Dan', 'Indonesia', '!'], ['!'], ['!'], ['Marilah', 'kita', 'membawa', 'gelombang', '#', 'BEInternational', 'ke', 'Pasaran', 'Melayu', '!'], ['!'], ['!'], ['🔥', '🔥', '🔥', '🔥', '🔥']]
I want get the "!!!" as total .
Thanks!
The text was updated successfully, but these errors were encountered: