Skip to content

module__org.bibliome.alvisnlp.modules.OgmiosTokenizer

Robert Bossy edited this page Jul 27, 2017 · 1 revision

#org.bibliome.alvisnlp.modules.OgmiosTokenizer

Synopsis

Tokenizes the sections contents according to the Ogmios tokenizer specifications.

Description

org.bibliome.alvisnlp.modules.OgmiosTokenizer creates an annotation for each token found in the section contents according to the Ogmios tokenizer specifications and adds these annotations to the targetLayerName layer. The created annotations have a the feature tokenTypeFeature with one of the values:

  • alpha: for an alphabetic token;
  • num: for a numeric token;
  • sep: for a whitespace token;
  • symb: for all other tokens.

If separatorTokens is false, the org.bibliome.alvisnlp.modules.OgmiosTokenizer does not create annotations corresponding to whitespace tokens.

Parameters

Optional

Type: String

Name of the layer where to store the tokens.

Optional

Type: String

Name of the token feature where to store the token type (alpha, num, sep, symb).

Optional

Type: Mapping

Constant features to add to each annotation created by this module

Default value: true

Type: Expression

Only process document that satisfy this filter.

Default value: true

Type: Expression

Process only sections that satisfy this filter.

Default value: true

Type: Boolean

Either if separator tokens should be added.

Clone this wiki locally