Skip to content

Proper nouns and named entities.

Ahmet A. Akın edited this page Feb 20, 2016 · 3 revisions

In the NLP terminology most applications uses the a concept of "Named Entities" (NE) instead of "Proper Nouns". Often, NEs consist of more than one word. Such as:

"Türkiye", "Cahit Arf", "Kemal Tahir" or "Muğlak İşler Müdürlüğü". NE's may contain regular words in it. The task of identifying NE's in a document or sentence is calle "Named Entity Recognition" or NER. It is a higher level task than morphological analysis.

Zemberek in it's current form does not find NEs. It only can tell if a word can be a Proper Noun or not. If a user wants to find proper nouns in a sentence, probably the best way is to use a full fledged NER system.

Because Zemberek is a dictionary based system, proper nouns should be in the dictionary. And there are some of them available (About 25,000 of them). However, there are cases that it is hard to decide if a word is a proper noun or not. System tries to guess if word can be a Proper Noun. It also parses proper noun words that has no capital letters in the beginning. Also if a word does not exist in dictionary but it contains a ['], Zemberek parses that by creating a Proper Noun DictionaryItem on the fly.

Clone this wiki locally