Skip to content

Weights

eaxelson edited this page Aug 30, 2017 · 7 revisions

Weights

From end-user perspective, weight tells how probable a word or its analysis is. The weight can be thought as a penalty, i.e. words/analyses with a bigger weight are less probable. Accordingly, when there are several analyses for a word, they are printed in ascending order so that the most probable ones come first. It is possible to weight lexemes or grammatical rules, making it easier to disambiguate among several possible analyses for a given word. Of course weights can also be used in generating word forms.

For weights, we use the tropical semiring. When there are several paths or transitions that only differ by their weight, the tropical semiring chooses the one with the lowest weight. All HFST command line tools and functions by default support weights. If weights are not specified anywhere, the tools/functions just operate with zero weights. There are three back-end implementation formats available for almost all HFST tools/functions: sfst, openfst-tropical and foma, openfst-tropical being the weighted one and used by default.

Using weights in regular expressions

Weights can be specified in regular expressions when using command line tools hfst-regexp2fst and hfst-xfst as well as function hfst.regexp of python API. The mechanism for adding weights is the :: operator which can be used for assigning weights to individual transitions or to any regular expression in brackets, i.e.

a::weight
a:b::weight
[ any regular expression ]::weight

The weights are most often from the tropical semiring. The tropical weight is represented as a float, i.e. one or more digits that may be preceded by a minus or plus sign and followed by a comma followed by at least one digit. For example the regular expression

[ a b:c::0.5 d::0.3 ]::0.2

will produce a transducer that maps abd to acd with weight 0.5 + 0.3 + 0.2 = 1.0. In this example, we basically have a transition a:a with no weight followed by a transition b:c with weight 0.5 followed by transition d:d with weight 0.3 leading to a final state with weight 0.2. However, it is possible that operations that are called afterwards, e.g. minimization, modify the exact positions of weights in the transducer.

A more complex expression

[ [ foo:bar::-1.15 ]::+0.15 baz::0.5 ]::0.7

will yield a transducer that maps foobaz to barbaz with weight -1.15 + 0.15 + 0.5 + 0.7 = 0.2.

Note that using weights is possible only when using the implementation openfst-tropical (and basically openfst-log which is not very well supported). Inserting weights with unweighted implementations, i.e. sfst or foma, has no effect.

Using weights in other tools

Tool Usage
[[hfst-lexc HfstLexc]]
[[hfst-twolc HfstTwolc]]
[[hfst-strings2fst HfstStrings2Fst]]
[[hfst-txt2fst HfstTxt2Fst]]

Shortcomings and caveats

There are some issues with weights that must be considered when specifying them or applying certain operations on weighted transducers. See our kitwiki pages for more information.