Skip to content

A class project re-implementing a simple algorithm for unsupervised morpheme induction.

Notifications You must be signed in to change notification settings

kaleidoescape/PyPortS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyPortS

A Python3 implementation of Keshava and Pitler's (2006) "RePortS" algorithm for unsupervised morpheme induction.

Running PyPortS

PyPortS is trained on a corpus of words. It accepts text files with one word per line for the training corpus. Multiple text files can be included.

It also requires a test corpus with a matching gold standard corpus. The test corpus is a text file with one word per line. The gold standard corpus has the same words as the test corpus, in the same order, but with plus signs (+) between the morphemes.

When training pyports.py, a version number needs to be included. This is used for saving the model for reuse in the future.

To train pyports:

$ python3 pyports.py train ver_1.0.0 test.txt gold.txt train1.txt train2.txt

To test pyports:

$ python3 pyports.py test ver_1.0.0 test.txt gold.txt

Some datasets are already included from the original project (Russian, English, Japanese (kanji), Japanese (kunrei)). To run all of them in the same configuration as in the original project:

python3 pyports.py standard

About

A class project re-implementing a simple algorithm for unsupervised morpheme induction.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published