-
Notifications
You must be signed in to change notification settings - Fork 1
Weights
From end-user perspective, weight tells how probable a word or its analysis is. The weight can be thought as a penalty, i.e. words/analyses with a bigger weight are less probable. Accordingly, when there are several analyses for a word, they are printed in ascending order so that the most probable ones come first. It is possible to weight lexemes or grammatical rules, making it easier to disambiguate among several possible analyses for a given word. Of course weights can also be used in generating word forms.
For weights, we use the tropical semiring. When there are several paths or transitions that only differ by their weight, the tropical semiring chooses the one with the lowest weight. All HFST command line tools and functions by default support weights. If weights are not specified anywhere, the tools/functions just operate with zero weights. There are three back-end implementation formats available for almost all HFST tools/functions: sfst, openfst-tropical and foma, openfst-tropical being the weighted one and used by default.
Weights can be specified in regular expressions when using command line tools hfst-regexp2fst and hfst-xfst as well as function hfst.regexp
of python API. The mechanism for adding weights is the ::
operator which can be used for assigning weights to individual transitions or to any regular expression in brackets, i.e.
a::weight
a:b::weight
[ any regular expression ]::weight
The weights are most often from the tropical semiring. The tropical weight is represented as a float, i.e. one or more digits that may be preceded by a minus or plus sign and followed by a comma followed by at least one digit. For example the regular expression
[ a b:c::0.5 d::0.3 ]::0.2
will produce a transducer that maps abd to acd with weight 0.5 + 0.3 + 0.2 = 1.0. In this example, we basically have a transition a:a with no weight followed by a transition b:c with weight 0.5 followed by transition d:d with weight 0.3 leading to a final state with weight 0.2. However, it is possible that operations that are called afterwards, e.g. minimization, modify the exact positions of weights in the transducer.
A more complex expression
[ [ foo:bar::-1.15 ]::+0.15 baz::0.5 ]::0.7
will yield a transducer that maps foobaz to barbaz with weight -1.15 + 0.15 + 0.5 + 0.7 = 0.2.
Note that using weights is possible only when using the implementation openfst-tropical (and basically openfst-log which is not very well supported). Inserting weights with unweighted implementations, i.e. sfst or foma, has no effect.
Tool | Usage |
---|---|
[[hfst-lexc | HfstLexc]] |
[[hfst-twolc | HfstTwolc]] |
[[hfst-strings2fst | HfstStrings2Fst]] |
[[hfst-txt2fst | HfstTxt2Fst]] |
There are some issues with weights that must be considered when specifying them or applying certain operations on weighted transducers. See our kitwiki pages for more information.
Package hfst
- AttReader
- PrologReader
- HfstBasicTransducer
- HfstBasicTransition
- HfstTransducer
- HfstInputStream
- HfstOutputStream
- MultiCharSymbolTrie
- HfstTokenizer
- LexcCompiler
- XreCompiler
- PmatchContainer
- ImplementationType