-
Notifications
You must be signed in to change notification settings - Fork 61
本方案碼表製作流程
sgalal edited this page Apr 27, 2020
·
3 revisions
本方案詞庫製作流程詳見本倉庫 build
分支。
Install sgalal/opencc-python
$ git clone https://github.com/sgalal/opencc-python.git
$ cd opencc-python
$ python setup.py install
Install dependencies
$ pip install unihan-etl pandas sortedcontainers
$ unihan-etl -f kCantonese -F json --destination build/single_char/data/0-Unihan.json
$ build/build.py
- Export Cantonese pronunciation data in kCantonese to
build/single_char/data/0-Unihan.json
- Download and process the five data files mentioned above to
/build/single_char/data/0-*
- Sanitize the five data files and save to
/build/single_char/data/1-*
- Generate the result according to the principles, then save to variable
d_single_char
- Download LSHK Word List to
/build/word/data/香港語言學學會粵拼詞表.txt
- Read the file, discard single characters in the file and save the remained data to variable
d_word
- Write
d_single_char
andd_word
to file