Skip to content

HfstTransducer

eaxelson edited this page Feb 8, 2018 · 25 revisions

class HfstTransducer

A synchronous finite-state transducer.


Argument handling

Transducer functions modify their calling object and return a reference to the calling object after modification, unless otherwise mentioned. Transducer arguments are usually not modified.

# transducer is reversed
transducer.reverse()
# transducer2 is not modified, but a copy of it is disjuncted with
# transducer1
transducer1.disjunct(transducer2)
# a chain of functions is possible
transducer.reverse().determinize().reverse().determinize()

Implementation types

Currently, an HfstTransducer has three implementation types that are well supported. When an HfstTransducer is created, its type is defined with an argument. For functions that take a transducer as an argument, the type of the calling transducer must be the same as the type of the argument transducer:

# this will cause a TransducerTypeMismatchException:
tropical_transducer.disjunct(foma_transducer)
# this works, but weights are lost in the conversion
tropical_transducer.convert(hfst.ImplementationType.SFST_TYPE).disjunct(sfst_transducer)
# this works, information is not lost
tropical_transducer.disjunct(sfst_transducer.convert(hfst.ImplementationType.TROPICAL_OPENFST_TYPE))

Creating transducers

With HfstTransducer constructors it is possible to create empty, epsilon, one-transition and single-path transducers. Transducers can also be created from scratch with HfstBasicTransducer and converted to an HfstTransducer. More complex transducers can be combined from simple ones with various functions. class HfstTransducer:

Whether HFST is linked to the transducer library needed by implementation type type. is_implementation_type_available(type):


__init__ (self)

Create an empty transducer.

tr = hfst.HfstTransducer()
assert(tr.compare(hfst.empty_fst()))

__init__ (self, another)

Create a deep copy of HfstTransducer another or a transducer equivalent to HfstBasicTransducer another.

  • another An HfstTransducer or HfstBasicTransducer.

An example:

tr1 = hfst.regex('foo bar foo')
tr2 = hfst.HfstTransducer(tr1)
tr2.substitute('foo','FOO')
tr1.concatenate(tr2)

__init__ (self, t, type)

Create an HFST transducer equivalent to HfstBasicTransducer t. The type of the created transducer is defined by type.

  • t An HfstBasicTransducer.
  • type The type of the resulting transducer.

If you want to use the default type, you can just call hfst.HfstTransducer(fsm)


copy (self)

Return a deep copy of the transducer.

tr = hfst.regex('[foo:bar::0.3]*')
TR = tr.copy()
assert(tr.compare(TR))

set_name (self, name)

Rename the transducer name.

  • name The name of the transducer.

See also: get_name


get_name (self)

Get the name of the transducer.

See also: set_name


__str__ (self)

An AT&T representation of the transducer.

Defined for print command. An example:

>>> print(hfst.regex('[foo:bar::2]+'))
0       1       foo     bar     2.000000
1       1       foo     bar     2.000000
1       0.000000

Todo: Works only for small transducers.


prune (self)

Make transducer coaccessible.

A transducer is coaccessible iff there is a path from every state to a final state.


set_property (self, property, value)

Set arbitrary string property property to value.

  • property A string naming the property.
  • value A string expressing the value of property.

Note: set_property('name', 'name of the transducer') equals set_name('name of the transducer').

Note: While this function is capable of creating endless amounts of arbitrary metadata, it is suggested that property names are drawn from central repository, or prefixed with "x-". A property that does not follow this convention may affect the behavior of transducer in future releases.


get_property (self, property)

Get arbitrary string propert @a property.

  • property The name of the property whose value is returned.

get_property('name') works like get_name().


get_properties (self)

Get all properties from the transducer.

Return: A dictionary whose keys are properties and whose values are the values of those properties.


get_alphabet (self)

Get the alphabet of the transducer.

The alphabet is defined as the set of symbols known to the transducer.

Return: A tuple of strings.


insert_to_alphabet (self, symbol)

Explicitly insert symbol to the alphabet of the transducer.

  • symbol The symbol (string) to be inserted.

Note: Usually this function is not needed since new symbols are added to the alphabet by default.


remove_from_alphabet (self, symbol)

Remove symbol from the alphabet of the transducer.

  • symbol The symbol (string) to be removed.

Precondition: symbol does not occur in any transition of the transducer.

Note: Use with care, removing a symbol that occurs in a transition of the transducer can have unexpected results.


eliminate_flag (self, symbol)

Eliminate flag diacritic symbol from the transducer.

  • symbol The flag to be eliminated. TODO: explain more.

An equivalent transducer with no flags symbol.


eliminate_flags (self, symbols)

Eliminate flag diacritics listed in symbols from the transducer.

  • symbols The flags to be eliminated. TODO: explain more.

An equivalent transducer with no flags listed in symbols.


is_automaton (self)

Whether each transition in the transducer has equivalent input and output symbols. Note: Transition with hfst.UNKNOWN on both sides IS NOT a transition with equivalent input and output symbols. Note: Transition with hfst.IDENTITY on both sides IS a transition with equivalent input and output symbols.


is_cyclic (self)

Whether the transducer is cyclic.


get_type (self)

The implementation type of the transducer. Return: hfst.ImplementationType


compare (self, another)

Whether this transducer and another are equivalent.

  • another The compared transducer. Precondition: self and another must have the same implementation type.

Two transducers are equivalent iff they accept the same input/output string pairs with the same weights and the same alignments. Note: For weighted transducers, the function often returns false negatives due to weight precision issues.


remove_epsilons (self)

Remove all epsilon:epsilon transitions from the transducer so that the resulting transducer is equivalent to the original one.


determinize (self)

Determinize the transducer.

Determinizing a transducer yields an equivalent transducer that has no state with two or more transitions whose input:output symbol pairs are the same.


number_of_states (self)

The number of states in the transducer.


number_of_arcs (self)

The number of transitions in the transducer.


write (self, ostr)

Write the transducer in binary format to ostr.

  • ostr A hfst.HfstOutputStream where the transducer is written.

write_att (self, f, write_weights=True)

Write the transducer in AT&T format to file f, write_weights defined whether weights are written.

  • f A python file where transducer is written.
  • write_weights Whether weights are written.

write_prolog (self, f, name, write_weights=True)

Write the transducer in prolog format with name name to file f, write_weights defined whether weights are written.

  • f A python file where the transducer is written.
  • name The name of the transducer that must be given in a prolog file.
  • write_weights Whether weights are written.

minimize (self)

Minimize the transducer.

Minimizing a transducer yields an equivalent transducer with the smallest number of states.

Known bugs: OpenFst's minimization algorithm seems to add epsilon transitions to weighted transducers?


n_best (self, n)

Extract n best paths of the transducer.

In the case of a weighted transducer (TROPICAL_OPENFST_TYPE or LOG_OPENFST_TYPE, best paths are defined as paths with the lowest weight. In the case of an unweighted transducer (SFST_TYPE or FOMA_TYPE), the function returns random paths.

This function is not implemented for FOMA_TYPE or SFST_TYPE. If this function is called by an HfstTransducer of type FOMA_TYPE or SFST_TYPE, it is converted to TROPICAL_OPENFST_TYPE, paths are extracted and it is converted back to FOMA_TYPE or SFST_TYPE. If HFST is not linked to OpenFst library, an hfst.exceptions.ImplementationTypeNotAvailableException is thrown.


repeat_star (self)

A concatenation of N transducers where N is any number from zero to infinity.


repeat_plus (self)

A concatenation of N transducers where N is any number from one to infinity.


repeat_n (self, n)

A concatenation of n transducers.


repeat_n_minus (self, n)

A concatenation of N transducers where N is any number from zero to n, inclusive.


repeat_n_plus (self, n)

A concatenation of N transducers where N is any number from n to infinity, inclusive.


repeat_n_to_k (self, n, k)

A concatenation of N transducers where N is any number from n to k, inclusive.


optionalize (self)

Disjunct the transducer with an epsilon transducer.


invert (self)

Swap the input and output symbols of each transition in the transducer.


reverse (self)

Reverse the transducer.

A reverted transducer accepts the string 'n(0) n(1) ... n(N)' iff the original transducer accepts the string 'n(N) n(N-1) ... n(0)'


input_project (self)

Extract the input language of the transducer.

All transition symbol pairs isymbol:osymbol are changed to isymbol:isymbol.


output_project (self)

Extract the output language of the transducer.

All transition symbol pairs isymbol:osymbol are changed to osymbol:osymbol.


compose (self, another)

Compose this transducer with another.

  • another The second argument in the composition. Not modified.

lenient_composition (self, another)

Perform a lenient composition on this transducer and another. TODO: explain more.


compose_intersect (self, v, invert=False)

Compose this transducer with the intersection of transducers in v. If invert is true, then compose the intersection of the transducers in v with this transducer.

The algorithm used by this function is faster than intersecting all transducers one by one and then composing this transducer with the intersection.

Precondition: The transducers in v are deterministic and epsilon-free.

  • v A tuple of transducers.
  • invert Whether the intersection of the transducers in v is composed with this transducer.

concatenate (self, another)

Concatenate this transducer with another.


disjunct (self, another)

Disjunct this transducer with another.


intersect (self, another)

Intersect this transducer with another.


subtract (self, another)

Subtract transducer another from this transducer.


minus (self, another)

Alias for subtract. See also: subtract


conjunct (self, another)

Alias for intersect. See also: intersect


convert (self, type, options='')

Convert the transducer into an equivalent transducer in format type.

If a weighted transducer is converted into an unweighted one, all weights are lost. In the reverse case, all weights are initialized to the semiring's one.

A transducer of type SFST_TYPE, TROPICAL_OPENFST_TYPE, LOG_OPENFST_TYPE or FOMA_TYPE can be converted into an HFST_OL_TYPE or HFST_OLW_TYPE transducer, but an HFST_OL_TYPE or HFST_OLW_TYPE transducer cannot be converted to any other type.

Note: For conversion between HfstBasicTransducer and HfstTransducer, see HfstTransducer.__init__ and HfstBasicTransducer.__init__.


write_att (self, ofile, write_weights=True)

Write the transducer in AT&T format to file ofile, write_weights defines whether weights are written.

The fields in the resulting AT&T format are separated by tabulator characters.

NOTE: If the transition symbols contain space characters,the spaces are printed as '@SPACE@' because whitespace characters are used as field separators in AT&T format. Epsilon symbols are printed as '@0@'.

If several transducers are written in the same file, they must be separated by a line of two consecutive hyphens "--", so that they will be read correctly by hfst.read_att.

An example:

tr1 = hfst.regex('[foo:bar baz:0 " "]::0.3')
tr2 = hfst.empty_fst()
tr3 = hfst.epsilon_fst(0.5)
tr4 = hfst.regex('[foo]')
tr5 = hfst.empty_fst()

f = hfst.hfst_open('testfile.att', 'w')
for tr in [tr1, tr2, tr3, tr4]:
    tr.write_att(f)
    f.write('--\n')
tr5.write_att(f)
f.close()

This will yield a file 'testfile.att' that looks as follows:

0       1       foo     bar     0.299805
1       2       baz     @0@     0.000000
2       3       @_SPACE_@       @_SPACE_@       0.000000
3       0.000000
--
--
0       0.500000
--
0       1       foo     foo     0.000000
1       0.000000
--

Throws:

See also: HfstOutputStream.write


write_att (self, filename, write_weights=True)

Write the transducer in AT&T format to file named filename. write_weights defines whether weights are written.

If the file exists, it is overwritten. If the file does not exist, it is created.


priority_union (self, another)

Make priority union of this transducer with another.

For the operation t1.priority_union(t2), the result is a union of t1 and t2, except that whenever t1 and t2 have the same string on left side, the path in t2 overrides the path in t1.

Example

Transducer 1 (t1):
a : a
b : b

Transducer 2 (t2):
b : B
c : C

Result ( t1.priority_union(t2) ):
a : a
b : B
c : C

For more information, read fsmbook.


cross_product (self, another)

Make cross product of this transducer with another. It pairs every string of this with every string of another. If strings are not the same length, epsilon padding will be added in the end of the shorter string. Precondition: Both transducers must be automata, i.e. map strings onto themselves.


shuffle (self, another)

Shuffle this transducer with transducer another.

If transducer A accepts string 'foo' and transducer B string 'bar', the transducer that results from shuffling A and B accepts all strings [(f|b)(o|a)(o|r)].

Precondition: Both transducers must be automata, i.e. map strings onto themselves.


insert_freely (self, ins)

Freely insert a transition or a transducer into the transducer.

  • ins The transition or transducer to be inserted.

If ins is a transition, i.e. a 2-tuple of strings: A transition is added to each state in this transducer. The transition leads from that state to itself with input and output symbols defined by ins. The weight of the transition is zero.

If ins is an HfstTransducer: A copy of ins is attached with epsilon transitions to each state of this transducer. After the operation, for each state S in this transducer, there is an epsilon transition that leads from state S to the initial state of ins, and for each final state of ins, there is an epsilon transition that leads from that final state to state S in this transducer. The weights of the final states in ins are copied to the epsilon transitions leading to state S.


set_final_weights (self, weight)

Set the weights of all final states to weight. If the HfstTransducer is of unweighted type (SFST_TYPE or FOMA_TYPE), nothing is done.


push_weights_to_start (self)

Push weights towards initial state.

If the HfstTransducer is of unweighted type (SFST_TYPE or FOMA_TYPE), nothing is done.

An example:

>>> import hfst
>>> tr = hfst.regex('[a::1 a:b::0.3 (b::0)]::0.7;')
>>> tr.push_weights_to_start()
>>> print(tr)
0       1       a       a       2.000000
1       2       a       b       0.000000
2       3       b       b       0.000000
2       0.000000
3       0.000000
# See also: push_weights_to_end

push_weights_to_end (self)

Push weights towards final state(s).

If the HfstTransducer is of unweighted type (SFST_TYPE or FOMA_TYPE), nothing is done.

An example:

>>> import hfst
>>> tr = hfst.regex('[a::1 a:b::0.3 (b::0)]::0.7;')
>>> tr.push_weights_to_end()
>>> print(tr)
0       1       a       a       0.000000
1       2       a       b       0.000000
2       3       b       b       0.000000
2       2.000000
3       2.000000

See also: push_weights_to_start


substitute (self, s, S=None, **kwargs)

Substitute symbols or transitions in the transducer.

  • s The symbol or transition to be substituted. Can also be a dictionary of substitutions, if S == None.
  • S The symbol, transition, a tuple of transitions or a transducer (hfst.HfstTransducer) that substitutes s.
  • kwargs Arguments recognized are 'input' and 'output', their values can be False or True, True being the default. These arguments are valid only if s and S are strings, else they are ignored.
  • input Whether substitution is performed on input side, defaults to True. Valid only if s and S are strings.
  • output Whether substitution is performed on output side, defaults to True. Valid only if s and \ S are strings.

For more information, see HfstBasicTransducer.substitute. The function works similarly, with the exception of argument S, which must be an HfstTransducer instead of HfstBasicTransducer.

See also: HfstBasicTransducer.substitute


lookup_optimize (self)

Optimize the transducer for lookup. This effectively converts the transducer into HFST_OL_TYPE.


lookup (self, input, kwargs)

Lookup string input.

  • input The input. A string or a pre-tokenized tuple of symbols (i.e. a tuple of strings).
  • kwargs Possible parameters and their default values are: obey_flags=True, max_number=-1, time_cutoff=0.0, output='tuple'
  • obey_flags Whether flag diacritics are obeyed. Always True for HFST_OL(W)_TYPE transducers.
  • max_number Maximum number of results returned, defaults to -1, i.e. infinity.
  • time_cutoff How long the function can search for results before returning, expressed in seconds. Defaults to 0.0, i.e. infinitely. Always 0.0 for transducers that are not of HFST_OL(W)_TYPE.
  • output Possible values are 'tuple', 'text' and 'raw', 'tuple' being the default.

Note: This function has an efficient implementation only for optimized lookup format HFST_OL_TYPE or HFST_OLW_TYPE). Other formats perform the lookup via composition. Consider converting the transducer to optimized lookup format or to a HfstBasicTransducer. Conversion to HFST_OL(W)_TYPE might take a while but the lookup is fast. Conversion to HfstBasicTransducer is quick but lookup is slower.


remove_optimization (self)

Remove lookup optimization. This effectively converts transducer (back) into default fst type.


extract_paths (self, **kwargs)

Extract paths that are recognized by the transducer.

  • kwargs Arguments recognized are filter_flags, max_cycles, max_number, obey_flags, output, random.
  • filter_flags Whether flags diacritics are filtered out from the result (default True).
  • max_cycles Indicates how many times a cycle will be followed, with negative numbers indicating unlimited (default -1 i.e. unlimited).
  • max_number The total number of resulting strings is capped at this value, with 0 or negative indicating unlimited (default -1 i.e. unlimited).
  • obey_flags Whether flag diacritics are validated (default True).
  • output Output format. Values recognized: 'text', 'raw', 'dict' (the default). 'text' returns a string where paths are separated by newlines and each path is represented as input_string + ":" + output_string + "\t" t weight. 'raw' yields a tuple of all paths where each path is a 2-tuple consisting of a weight and a tuple of all transition symbol pairs, each symbol pair being a 2-tuple of an input and an output symbol. 'dict' gives a dictionary that maps each input string into a list of possible outputs, each output being a 2-tuple of an output string and a weight.
  • random Whether result strings are fetched randomly (default False). Return: The extracted strings. output controls how they are represented.

Precondition: The transducer must be acyclic, if both max_number and max_cycles have unlimited values. Else a hfst.exceptions.TransducerIsCyclicException will be thrown.

An example:

>>> tr = hfst.regex('a:b+ (a:c+)')
>>> print(tr)
0       1       a       b       0.000000
1       1       a       b       0.000000
1       2       a       c       0.000000
1       0.000000
2       2       a       c       0.000000
2       0.000000

>>> print(tr.extract_paths(max_cycles=1, output='text'))
a:b     0
aa:bb   0
aaa:bbc 0
aaaa:bbcc       0
aa:bc   0
aaa:bcc 0

>>> print(tr.extract_paths(max_number=4, output='text'))
a:b     0
aa:bc   0
aaa:bcc 0
aaaa:bccc       0

>>> print(tr.extract_paths(max_cycles=1, max_number=4, output='text'))
a:b     0
aa:bb   0
aa:bc   0
aaa:bcc 0

Throws:

See also: HfstTransducer.n_best

Note: Special symbols are printed as such.

Todo: a link to flag diacritics


extract_shortest_paths (self)

Extract shortest paths of the transducer. Return: A dictionary.


extract_longest_paths (self, **kwargs)

Extract longest paths of the transducer. Return: A dictionary.


longest_path_size (self, **kwargs)

Get length of longest path of the transducer.


lookup (self, input, limit=-1)

Lookup or apply a single tokenized string tok_input and return a maximum of limit results.

TODO: This is a version of lookup that handles flag diacritics as ordinary symbols and does not validate the sequences prior to outputting. Currently, this function calls lookup_fd.

Todo: Handle flag diacritics as ordinary symbols instead of calling lookup_fd.

See also: lookup_fd

Return: HfstOneLevelPaths pointer

  • tok_input A tuple of consecutive symbols (strings).
  • limit Number of strings to look up. -1 tries to look up all and may get stuck if infinitely ambiguous.

Lookup or apply a single string input and return a maximum of limit results.

This is an overloaded lookup function that leaves tokenizing to the transducer. Return: HfstOneLevelPaths pointer


lookup (self, tok, input, limit=-1)

Lookup or apply a single string input and return a maximum of limit results. tok defined how s is tokenized.

This is an overloaded lookup function that leaves tokenizing to tok. Return: HfstOneLevelPaths pointer


lookup_fd (self, tok_input, limit = -1)

Lookup or apply a single string tok_input minding flag diacritics properly and return a maximum of limit results.

Traverse all paths on logical first level of the transducer to produce all possible outputs on the second. This is in effect a fast composition of single path from left hand side.

This is a version of lookup that handles flag diacritics as epsilons and validates the sequences prior to outputting. Epsilons on the second level are represented by empty strings in the results.

Precondition: The transducer must be of type HFST_OL_TYPE or HFST_OLW_TYPE. This function is not implemented for other transducer types.

  • tok_input A tuple of consecutive symbols (strings) to look up.
  • limit (Currently ignored.) Number of strings to look up. -1 tries to look up all and may get stuck if infinitely ambiguous.

See also: is_lookup_infinitely_ambiguous Return: HfstOneLevelPaths pointer

Todo: Do not ignore argument limit.


lookup_fd (self, input, limit = -1)

Lookup or apply a single string s minding flag diacritics properly and return a maximum of limit results.

This is an overloaded lookup function that leaves tokenizing to the transducer. Return: HfstOneLevelPaths pointer


lookup_fd(tok, input, limit = -1)

Lookup or apply a single string input minding flag diacritics properly and return a maximum of limit results. tok defines how s is tokenized.

This is an overloaded lookup function that leaves tokenizing to tok. Return: HfstOneLevelPaths pointer


is_lookup_infinitely_ambiguous (self, tok_input)

Whether lookup of path input will have infinite results.

Currently, this function will return whether the transducer is infinitely ambiguous on any lookup path found in the transducer, i.e. the argument input is ignored.

Todo: Do not ignore the argument input


is_infinitely_ambiguous (self)

Whether the transducer is infinitely ambiguous.

A transducer is infinitely ambiguous if there exists an input that will yield infinitely many results, i.e. there are input epsilon loops that are traversed with that input.


has_flag_diacritics (self)

Whether the transducer has flag diacritics in its transitions.

Clone this wiki locally