-
Notifications
You must be signed in to change notification settings - Fork 6
module__org.bibliome.alvisnlp.modules.projectors.TyDIProjector
#org.bibliome.alvisnlp.modules.projectors.TyDIProjector
Projects terms from a TiDI export.
This module is obsolete, superceded by org.bibliome.alvisnlp.modules.trie.TyDIExportProjector
org.bibliome.alvisnlp.modules.projectors.TyDIProjector reads different files from a TyDI text export, resolves all synonymies and projects the terms into sections.
The parameters lemmaFile, synonymsFile, quasiSynonymsFile, acronymsFile and typographicVariationsFile point to the paths to the corresponding TyDI file export.
The parameters normalizeSpace, ignoreCase, ignoreDiacritics and ignoreWhitespace control the matching of entries on the sections.
The subject parameter specifies which text of the section should be matched. There are two options:
- the entries are matched on the contents of the section, subject can also control if matches boundaries coincide with word delimiters;
- the entries are matched on the feature value of annotations of a given layer separated by a whitespace, in this way entries can be searched against word lemmas for instance.
org.bibliome.alvisnlp.modules.projectors.TyDIProjector creates an annotation for each matched entry and adds these annotations to the layer named targetLayerName. The created annotations will have a feature named canonicalFormFeature containing the canonical form of the matched term. In addition, the created annotations will have the feature keys and values defined in constantAnnotationFeatures.
Optional
Type: SourceStream
Path to the file containing lemmas.
Optional
Type: SourceStream
Path to the merged terms file.
Optional
Type: SourceStream
Path to the quasi-synonyms file.
Optional
Type: SourceStream
Path to the synonyms file.
Optional
Type: String
Name of the layer where to put match annotations.
Optional
Type: SourceStream
Path to the acronyms file.
Optional
Type: Mapping
Constant features to add to each annotation created by this module
Optional
Type: TargetStream
Path of the file where to save the dictionary.
Optional
Type: SourceStream
Path to the typographic variations file.
Default value: lemma
Type: String
Feature where to store the term canonical form.
Default value: true
Type: Expression
Only process document that satisfy this filter.
Default value: false
Type: Boolean
Either to stop when a duplicate entry is seen.
Default value: false
Type: Boolean
Match ignoring case.
Default value: false
Type: Boolean
Match ignoring diacritics.
Default value: false
Type: Boolean
Match ignoring whitespace characters.
Default value: add
Type: MultipleValueAction
Either to stop when multiple entries with the same key is seen.
Default value: false
Type: Boolean
Match normalizing whitespace.
Default value: true
Type: Expression
Process only sections that satisfy this filter.
Default value: org.bibliome.alvisnlp.modules.projectors.ContentsSubject@3ce1e309
Type: Subject
Subject on which to project the dictionary.