HMM-POS-Tagger

The corpus has been adapted from the Catalan portion of WikiCorpus v. 1.0, as follows:

The corpus contains only a selection (< 1.2M words) from the original set.
The corpus contains only tokens and parts of speech, not lemmas and word senses.
The part-of-speech tags have been simplified from the original, resulting in 29 tags.
The format has been changed to the word/TAG format, with each sentence on a separate line.

The corpus is licensed under the same terms as the original, that is, the GNU Free Documentation License (FDL; http://www.fsf.org/licensing/licenses/fdl.html). That means that you are allowed to use and redistribute the texts, provided the derived works keep the same license.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
catalan_corpus_dev_raw.txt		catalan_corpus_dev_raw.txt
catalan_corpus_dev_tagged.txt		catalan_corpus_dev_tagged.txt
catalan_corpus_train_tagged.txt		catalan_corpus_train_tagged.txt
hmmdecode3.py		hmmdecode3.py
hmmlearn3.py		hmmlearn3.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HMM-POS-Tagger

About

Releases

Packages

Languages

amjha/HMM-POS-Tagger

Folders and files

Latest commit

History

Repository files navigation

HMM-POS-Tagger

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages