Skip to content

ieee-sun/zh-word-cloud

Repository files navigation

Chinese Word-Cloud rendering

繪製文字雲

  • Segmentated (thanks to github.com/fxsjy/jieba)
  • Normalized / Tokenized Walk-through + Demo
  • (optional) - Project-specific keywords (e.g. 程子,知本, 強地動觀測計畫) fetching
  • Revised cycle for unwanted jargon (e.g. 子曰, 者也) - while free online UI is not good at
  • Graphic customization

Python :: 3.10 :: Chinese Segmentation Cultural Open Access

禮記·大學 文字雲

As You Like It by William Shakespeare - Wordcloud

One of our demo is using jieba-tw as zh-Hant environment:

# ZH ENVIRONMENT CONSTANT

    # Download the traditional chinese dictionary from jieba-tw
    ### 繁體字 Jeiba-tw 詞庫
    ### [ https://raw.githubusercontent.com/ldkrsi/jieba-zh_TW/master/jieba/dict.txt ]

zh_DICT_FILEPATH = '/content/jieba/dict_jieba_tw_Dec2023.txt'
zh_DICT2_FILEPATH = '/content/jieba/dict_jeiba_tc_big.txt'
zh_STOPWORD_FILEPATH = '/content/jieba/sun-chinese-stopwords2023.txt'

# INIT traditional Chinese dictionary
jieba.set_dictionary(zh_DICT2_FILEPATH)

Walk-through with our Tutorial in this Jupyter Notebook .ipynb


Prepared & Published by:
Sun CHUNG, SMIEEE M.Sc. HKU - colab w/ MIT-IDSS