Skip to content

Latest commit

 

History

History
17 lines (11 loc) · 634 Bytes

README.md

File metadata and controls

17 lines (11 loc) · 634 Bytes

Tatar stopwords

This repository contains a list of stopwords for the Tatar language.

Method

The list was constructed manually based on word distributions obtained from news texts. There are mostly functional words (conjunctions, postpositions, interjections), as well as pronouns and numerals, some high frequency verbs like "диде" ("said"), and a few parentheses.

Current count: 1006 wordforms (~300 unique lemmata).

Acknowledgments

Some rare functional words were included from Apertium. Additional surface wordforms were generated automatically also using Apertium.