-
Notifications
You must be signed in to change notification settings - Fork 6
module__OgmiosTokenizer
#org.bibliome.alvisnlp.modules.OgmiosTokenizer
Tokenizes the sections contents according to the Ogmios tokenizer specifications.
org.bibliome.alvisnlp.modules.OgmiosTokenizer creates an annotation for each token found in the section contents according to the Ogmios tokenizer specifications and adds these annotations to the targetLayerName layer. The created annotations have a the feature tokenTypeFeature with one of the values:
- alpha: for an alphabetic token;
- num: for a numeric token;
- sep: for a whitespace token;
- symb: for all other tokens.
If separatorTokens is false, the org.bibliome.alvisnlp.modules.OgmiosTokenizer does not create annotations corresponding to whitespace tokens.
Optional
Type: String
Name of the layer where to store the tokens.
Optional
Type: String
Name of the token feature where to store the token type (alpha, num, sep, symb).
Optional
Type: Mapping
Constant features to add to each annotation created by this module
Default value: true
Type: Expression
Only process document that satisfy this filter.
Default value: true
Type: Expression
Process only sections that satisfy this filter.
Default value: true
Type: Boolean
Either if separator tokens should be added.