-
Notifications
You must be signed in to change notification settings - Fork 6
module__TreeTagger
#org.bibliome.alvisnlp.modules.treetagger.TreeTagger
Runs tree-tagger.
org.bibliome.alvisnlp.modules.treetagger.TreeTagger applies tree-tagger on annotations in wordLayerName by generating an appropriate input file. This file will contain one line for each annotation. The first column, the token surface form, is the value of the formFeature feature. The second column, the token predefined POS tag, is the value posFeature feature. The third column, the token predefined lemma, is the value of lemmaFeature feature. If posFeature or lemmaFeature are not defined, then the second and third column are left blank.
The tree-tagger binary is specified by treeTaggerExecutable and the language model to use is specified by parFile. Additionally a lexicon file can be given through lexiconFile.
If sentenceLayerName is defined, then org.bibliome.alvisnlp.modules.treetagger.TreeTagger considers annotations in this layer as sentences. Sentence boundaries are reinforced by providing tree-tagger an additional end-of-sentence marker.
Once tree-tagger has processed the corpus, org.bibliome.alvisnlp.modules.treetagger.TreeTagger adds the predicted POS tag and lemma to the respective posFeature and lemmaFeature features of the corresponding annotations.
If recordDir and recordFeatures are both defined, then tree-tagger predictions are written into files in one file per section in the recordDir directory. recordFeatures is an array of feature names to record. An additional feature n is recognized as the annotation ordinal in the section.
Optional
Type: InputFile
Path to the language model file.
Optional
Type: ExecutableFile
Path to the tree-tagger executable file.
Optional
Type: Mapping
Constant features to add to each annotation created by this module
Optional
Type: SourceStream
Path to a tree-tagger lexicon file, if set the lexicon will be applied to the corpus before treetagger processes it.
Optional
Type: OutputDirectory
Path to the directory where to write tree-tagger result files (one file per section).
Optional
Type: String[]]
List of attributes to display in result files.
Default value: true
Type: Expression
Only process document that satisfy this filter.
Default value: form
Type: String
Name of the feature denoting the token surface form.
Default value: ISO-8859-1
Type: String
Tree-tagger input corpus character set.
Default value: lemma
Type: String
Name of the feature to set with the lemma.
Default value: false
Type: Boolean
Either to replace unknown lemmas with the surface form.
Default value: ISO-8859-1
Type: String
Tree-tagger output character set.
Default value: pos
Type: String
Name of the feature to set with the POS tag.
Default value: UTF-8
Type: String
Character encoding of the result files.
Default value: true
Type: Expression
Process only sections that satisfy this filter.
Default value: sentences
Type: String
Name of the layer containing sentence annotations, sentences are reinforced.
Default value: words
Type: String
Name of the layer containing the word annotations.