ChnHistPhon - Chinese Historical Phonology

Experiments in Chinese Historical Phonology using matrix decomposition and factorization methods.

Prerequisites

We use python for to prepare our data. The following packages are required:

pandas
numpy
cjklib
vPhon: a Vietnamese phonetizer: clone it to your local directory /path/to/vPhon
fancyimpute: install it from github repository

In addition to cjklib, Unihan Database is used. The latest Unihan.zip can be downloaded from https://www.unicode.org/Public/UCD/. Unzip it to /path/to/Unihan.

Running experiments

Prepare data

Once you have cloned this repository to your local /path/to/ChnHistPhon, you can run

python /path/to/ChnHistPhon/ChnHistPhon_1_data_preparation.py

which will create ChnCharData.csv a dataset of Chinese characters we need in /path/to/ChnHistPhon/results.

Perform low-rank SVD

We used softImpute (Mazumder et al., 2010.) to complete the data matrix in ChnCharData.csv, which is followed by dictionary learning and sparse coding in ChnHistPhon_2_run_SoftImpute_DictionaryLearning.py.

Results

The results can be viewed here.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
ChnHistPhon_1_data_preparation.py		ChnHistPhon_1_data_preparation.py
ChnHistPhon_2_run_SoftImpute_DictionaryLearning.py		ChnHistPhon_2_run_SoftImpute_DictionaryLearning.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChnHistPhon - Chinese Historical Phonology

Prerequisites

Running experiments

Prepare data

Perform low-rank SVD

Results

About

Releases

Packages

Languages

License

mondain-dev/chn-hist-phon

Folders and files

Latest commit

History

Repository files navigation

ChnHistPhon - Chinese Historical Phonology

Prerequisites

Running experiments

Prepare data

Perform low-rank SVD

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages