Skip to content

Project to generate POS tag dictionary for Ukrainian language

License

Notifications You must be signed in to change notification settings

ostasevych/dict_uk

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is a project to generate POS tag dictionary for Ukrainian language.

Це — проект генерування словника з тегами частин мови для української мови.

Description:

For all files in data/dict the project generates all possible word forms with POS tags
by using affix rules from files in data/affix.

How to run:

# gradle expandForCorp
Output:

    * out/dict_corp_vis.txt - Dictionary in visual (indented) format for review, analysis or conversion
    * out/dict_corp_lt.txt - Dictionary for LT for annotating the corpus
    * out/words.txt, out/lemmas.txt, out/tags.txt - list of all unique words, lemmas and tags

# gradle expandForRules
Output:

    * out/dict_rules_lt.txt - Dictionary file for LT (LanguageTool) used for grammar rules checking

About

Project to generate POS tag dictionary for Ukrainian language

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 46.5%
  • Groovy 43.6%
  • Java 7.3%
  • Shell 2.6%