- Segmentated (thanks to github.com/fxsjy/jieba)
- Normalized / Tokenized Walk-through + Demo
- (optional) - Project-specific keywords (e.g.
程子
,知本
,強地動觀測計畫
) fetching - Revised cycle for unwanted jargon (e.g.
子曰
,者也
) - while free online UI is not good at - Graphic customization
禮記·大學 文字雲
As You Like It by William Shakespeare - Wordcloud
One of our demo is using jieba-tw
as zh-Hant environment:
# ZH ENVIRONMENT CONSTANT
# Download the traditional chinese dictionary from jieba-tw
### 繁體字 Jeiba-tw 詞庫
### [ https://raw.githubusercontent.com/ldkrsi/jieba-zh_TW/master/jieba/dict.txt ]
zh_DICT_FILEPATH = '/content/jieba/dict_jieba_tw_Dec2023.txt'
zh_DICT2_FILEPATH = '/content/jieba/dict_jeiba_tc_big.txt'
zh_STOPWORD_FILEPATH = '/content/jieba/sun-chinese-stopwords2023.txt'
# INIT traditional Chinese dictionary
jieba.set_dictionary(zh_DICT2_FILEPATH)