Skip to content

List of Dutch lemma/plural pairs for training/evaluating (de)pluralizers

License

Notifications You must be signed in to change notification settings

CentreForDigitalHumanities/dutch-plurals

Repository files navigation

Dutch Plurals and Articles

DOI

This repository contains a list of manually annotated/verified Dutch plural singular/lemma combinations. This was done by Henk Pander Maat and can be found under input.tsv.

It also contains a list of articles at gender.tsv.

Using transform.py these files can be converted into output.tsv which is a file which can be used as input for froggen.

Wiktionary

The machine-readable Wiktionary data from Tatu Ylonen was used to expand these list and make an initial file containing the articles.

Considerations for Article Usage

For uncapitalized words an article can often be easily determined by a native speaker. If there is none (for example for months) it is left blank. For capitalized words there is more room for ambiguity. Rivers, lakes, seas, mountains, deserts, inhabitants, streets, squares and languages have articles. Acronyms, cars and devices also generally have them. Countries, cities, regions and brand names generally do not. Nominal adjectives should be recorded as adjectives

Nominal Adjectives

Words which can be used as a nominal adjectives are tagged as 'ADJ' instead of 'N'. If they can also be used as normal nouns those tags will also be added to the output file. For example compare "Het Nederlands is een Germaanse taal." (N) versus "Dat is typisch Nederlands." (ADJ).

About

List of Dutch lemma/plural pairs for training/evaluating (de)pluralizers

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages