PyPortS

A Python3 implementation of Keshava and Pitler's (2006) "RePortS" algorithm for unsupervised morpheme induction.

Running PyPortS

PyPortS is trained on a corpus of words. It accepts text files with one word per line for the training corpus. Multiple text files can be included.

It also requires a test corpus with a matching gold standard corpus. The test corpus is a text file with one word per line. The gold standard corpus has the same words as the test corpus, in the same order, but with plus signs (+) between the morphemes.

When training pyports.py, a version number needs to be included. This is used for saving the model for reuse in the future.

To train pyports:

$ python3 pyports.py train ver_1.0.0 test.txt gold.txt train1.txt train2.txt

To test pyports:

$ python3 pyports.py test ver_1.0.0 test.txt gold.txt

Some datasets are already included from the original project (Russian, English, Japanese (kanji), Japanese (kunrei)). To run all of them in the same configuration as in the original project:

python3 pyports.py standard

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
report		report
results		results
README.md		README.md
pyports.py		pyports.py
test_pyports.py		test_pyports.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyPortS

Running PyPortS

About

Releases

Packages

Languages

kaleidoescape/PyPortS

Folders and files

Latest commit

History

Repository files navigation

PyPortS

Running PyPortS

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages