-
Notifications
You must be signed in to change notification settings - Fork 1
HfstTransducer
-
class HfstTransducer
- Argument handling
- Implementation types
- Creating transducers
- __init__ (self)
- __init__ (self, another)
- __init__ (self, t, type)
- copy (self)
- set_name (self, name)
- get_name (self)
- str (self)
- prune (self)
- set_property (self, property, value)
- get_property (self, property)
- get_properties (self)
- get_alphabet (self)
- insert_to_alphabet (self, symbol)
- remove_from_alphabet (self, symbol)
- eliminate_flag (self, symbol)
- eliminate_flags (self, symbols)
- is_automaton (self)
- is_cyclic (self)
- get_type (self)
- compare (self, another)
- remove_epsilons (self)
- determinize (self)
- number_of_states (self)
- number_of_arcs (self)
- write (self, ostr)
- write_att (self, f, write_weights=True)
- write_prolog (self, f, name, write_weights=True)
- minimize (self)
- n_best (self, n)
- repeat_star (self)
- repeat_plus (self)
- repeat_n (self, n)
- repeat_n_minus (self, n)
- repeat_n_plus (self, n)
- repeat_n_to_k (self, n, k)
- optionalize (self)
- invert (self)
- reverse (self)
- input_project (self)
- output_project (self)
- compose (self, another)
- lenient_composition (self, another)
- compose_intersect (self, v, invert=False)
- concatenate (self, another)
- disjunct (self, another)
- intersect (self, another)
- subtract (self, another)
- minus (self, another)
- conjunct (self, another)
- convert (self, type, options='')
- write_att (self, ofile, write_weights=True)
- write_att (self, filename, write_weights=True)
- priority_union (self, another)
- cross_product (self, another)
- shuffle (self, another)
- insert_freely (self, ins)
- set_final_weights (self, weight)
- push_weights_to_start (self)
- push_weights_to_end (self)
- substitute (self, s, S=None, **kwargs)
- lookup_optimize (self)
- lookup(self, input, **kwargs)
- remove_optimization (self)
- extract_paths (self, **kwargs)
- extract_shortest_paths (self)
- extract_longest_paths (self, **kwargs)
- longest_path_size (self, **kwargs)
- lookup(input, limit=-1)
- lookup(tok, input, limit=-1)
- lookup_fd(tok_input, limit = -1)
- lookup_fd(input, limit = -1)
- lookup_fd(tok, input, limit = -1)
- is_lookup_infinitely_ambiguous (self, tok_input)
- is_infinitely_ambiguous (self)
- has_flag_diacritics (self)
A synchronous finite-state transducer.
Transducer functions modify their calling object and return a reference to the calling object after modification, unless otherwise mentioned. Transducer arguments are usually not modified.
# transducer is reversed
transducer.reverse()
# transducer2 is not modified, but a copy of it is disjuncted with
# transducer1
transducer1.disjunct(transducer2)
# a chain of functions is possible
transducer.reverse().determinize().reverse().determinize()
Currently, an HfstTransducer has three implementation types that are well supported. When an HfstTransducer is created, its type is defined with an argument. For functions that take a transducer as an argument, the type of the calling transducer must be the same as the type of the argument transducer:
# this will cause a TransducerTypeMismatchException:
tropical_transducer.disjunct(foma_transducer)
# this works, but weights are lost in the conversion
tropical_transducer.convert(hfst.ImplementationType.SFST_TYPE).disjunct(sfst_transducer)
# this works, information is not lost
tropical_transducer.disjunct(sfst_transducer.convert(hfst.ImplementationType.TROPICAL_OPENFST_TYPE))
With HfstTransducer constructors it is possible to create empty, epsilon, one-transition and single-path transducers. Transducers can also be created from scratch with HfstBasicTransducer and converted to an HfstTransducer. More complex transducers can be combined from simple ones with various functions. class HfstTransducer:
Whether HFST is linked to the transducer library needed by implementation type type
.
is_implementation_type_available(type):
Create an empty transducer.
tr = hfst.HfstTransducer()
assert(tr.compare(hfst.empty_fst()))
Create a deep copy of HfstTransducer another
or a transducer equivalent to HfstBasicTransducer another
.
-
another
An HfstTransducer or HfstBasicTransducer.
An example:
tr1 = hfst.regex('foo bar foo')
tr2 = hfst.HfstTransducer(tr1)
tr2.substitute('foo','FOO')
tr1.concatenate(tr2)
Create an HFST transducer equivalent to HfstBasicTransducer t.
The type of the created transducer is defined by type
.
-
t
An HfstBasicTransducer. -
type
The type of the resulting transducer.
If you want to use the default type, you can just call hfst.HfstTransducer(fsm)
Return a deep copy of the transducer.
tr = hfst.regex('[foo:bar::0.3]*')
TR = tr.copy()
assert(tr.compare(TR))
Rename the transducer name
.
-
name
The name of the transducer.
See also: get_name
Get the name of the transducer.
See also: set_name
An AT&T representation of the transducer.
Defined for print command. An example:
>>> print(hfst.regex('[foo:bar::2]+'))
0 1 foo bar 2.000000
1 1 foo bar 2.000000
1 0.000000
Todo: Works only for small transducers.
Make transducer coaccessible.
A transducer is coaccessible iff there is a path from every state to a final state.
Set arbitrary string property property
to value
.
-
property
A string naming the property. -
value
A string expressing the value ofproperty
.
Note: set_property('name', 'name of the transducer') equals set_name('name of the transducer').
Note: While this function is capable of creating endless amounts of arbitrary metadata, it is suggested that property names are drawn from central repository, or prefixed with "x-". A property that does not follow this convention may affect the behavior of transducer in future releases.
Get arbitrary string propert @a property.
-
property
The name of the property whose value is returned.
get_property('name') works like get_name().
Get all properties from the transducer.
Return: A dictionary whose keys are properties and whose values are the values of those properties.
Get the alphabet of the transducer.
The alphabet is defined as the set of symbols known to the transducer.
Return: A tuple of strings.
Explicitly insert symbol
to the alphabet of the transducer.
-
symbol
The symbol (string) to be inserted.
Note: Usually this function is not needed since new symbols are added to the alphabet by default.
Remove symbol
from the alphabet of the transducer.
-
symbol
The symbol (string) to be removed.
Precondition: symbol
does not occur in any transition of the transducer.
Note: Use with care, removing a symbol that occurs in a transition of the transducer can have unexpected results.
Eliminate flag diacritic symbol
from the transducer.
-
symbol
The flag to be eliminated. TODO: explain more.
An equivalent transducer with no flags symbol
.
Eliminate flag diacritics listed in symbols
from the transducer.
-
symbols
The flags to be eliminated. TODO: explain more.
An equivalent transducer with no flags listed in symbols
.
Whether each transition in the transducer has equivalent input and output symbols. Note: Transition with hfst.UNKNOWN on both sides IS NOT a transition with equivalent input and output symbols. Note: Transition with hfst.IDENTITY on both sides IS a transition with equivalent input and output symbols.
Whether the transducer is cyclic.
The implementation type of the transducer. Return: hfst.ImplementationType
Whether this transducer and another
are equivalent.
-
another
The compared transducer. Precondition:self
andanother
must have the same implementation type.
Two transducers are equivalent iff they accept the same input/output string pairs with the same weights and the same alignments. Note: For weighted transducers, the function often returns false negatives due to weight precision issues.
Remove all epsilon:epsilon transitions from the transducer so that the resulting transducer is equivalent to the original one.
Determinize the transducer.
Determinizing a transducer yields an equivalent transducer that has no state with two or more transitions whose input:output symbol pairs are the same.
The number of states in the transducer.
The number of transitions in the transducer.
Write the transducer in binary format to ostr
.
-
ostr
A hfst.HfstOutputStream where the transducer is written.
Write the transducer in AT&T format to file f,
write_weights
defined whether weights are written.
-
f
A python file where transducer is written. -
write_weights
Whether weights are written.
Write the transducer in prolog format with name name
to file f,
write_weights
defined whether weights are written.
-
f
A python file where the transducer is written. -
name
The name of the transducer that must be given in a prolog file. -
write_weights
Whether weights are written.
Minimize the transducer.
Minimizing a transducer yields an equivalent transducer with the smallest number of states.
Known bugs: OpenFst's minimization algorithm seems to add epsilon transitions to weighted transducers?
Extract n
best paths of the transducer.
In the case of a weighted transducer (TROPICAL_OPENFST_TYPE or LOG_OPENFST_TYPE, best paths are defined as paths with the lowest weight. In the case of an unweighted transducer (SFST_TYPE or FOMA_TYPE), the function returns random paths.
This function is not implemented for FOMA_TYPE or SFST_TYPE. If this function is called by an HfstTransducer of type FOMA_TYPE or SFST_TYPE, it is converted to TROPICAL_OPENFST_TYPE, paths are extracted and it is converted back to FOMA_TYPE or SFST_TYPE. If HFST is not linked to OpenFst library, an hfst.exceptions.ImplementationTypeNotAvailableException is thrown.
A concatenation of N transducers where N is any number from zero to infinity.
A concatenation of N transducers where N is any number from one to infinity.
A concatenation of n
transducers.
A concatenation of N transducers where N is any number from zero to n,
inclusive.
A concatenation of N transducers where N is any number from n
to infinity, inclusive.
A concatenation of N transducers where N is any number from n
to k,
inclusive.
Disjunct the transducer with an epsilon transducer.
Swap the input and output symbols of each transition in the transducer.
Reverse the transducer.
A reverted transducer accepts the string 'n(0) n(1) ... n(N)' iff the original transducer accepts the string 'n(N) n(N-1) ... n(0)'
Extract the input language of the transducer.
All transition symbol pairs isymbol:osymbol are changed to isymbol:isymbol.
Extract the output language of the transducer.
All transition symbol pairs isymbol:osymbol are changed to osymbol:osymbol.
Compose this transducer with another
.
-
another
The second argument in the composition. Not modified.
Perform a lenient composition on this transducer and another
.
TODO: explain more.
Compose this transducer with the intersection of
transducers in v.
If invert
is true, then compose the
intersection of the transducers in v
with this transducer.
The algorithm used by this function is faster than intersecting all transducers one by one and then composing this transducer with the intersection.
Precondition: The transducers in v
are deterministic and epsilon-free.
-
v
A tuple of transducers. -
invert
Whether the intersection of the transducers inv
is composed with this transducer.
Concatenate this transducer with another
.
Disjunct this transducer with another
.
Intersect this transducer with another
.
Subtract transducer another
from this transducer.
Alias for subtract. See also: subtract
Alias for intersect. See also: intersect
Convert the transducer into an equivalent transducer in format type
.
If a weighted transducer is converted into an unweighted one, all weights are lost. In the reverse case, all weights are initialized to the semiring's one.
A transducer of type SFST_TYPE, TROPICAL_OPENFST_TYPE, LOG_OPENFST_TYPE or FOMA_TYPE can be converted into an HFST_OL_TYPE or HFST_OLW_TYPE transducer, but an HFST_OL_TYPE or HFST_OLW_TYPE transducer cannot be converted to any other type.
Note: For conversion between HfstBasicTransducer and HfstTransducer, see HfstTransducer.__init__ and HfstBasicTransducer.__init__.
Write the transducer in AT&T format to file ofile,
write_weights
defines whether weights are written.
The fields in the resulting AT&T format are separated by tabulator characters.
NOTE: If the transition symbols contain space characters,the spaces are printed as '@SPACE@' because whitespace characters are used as field separators in AT&T format. Epsilon symbols are printed as '@0@'.
If several transducers are written in the same file, they must be separated by a line of two consecutive hyphens "--", so that they will be read correctly by hfst.read_att.
An example:
tr1 = hfst.regex('[foo:bar baz:0 " "]::0.3')
tr2 = hfst.empty_fst()
tr3 = hfst.epsilon_fst(0.5)
tr4 = hfst.regex('[foo]')
tr5 = hfst.empty_fst()
f = hfst.hfst_open('testfile.att', 'w')
for tr in [tr1, tr2, tr3, tr4]:
tr.write_att(f)
f.write('--\n')
tr5.write_att(f)
f.close()
This will yield a file 'testfile.att' that looks as follows:
0 1 foo bar 0.299805
1 2 baz @0@ 0.000000
2 3 @_SPACE_@ @_SPACE_@ 0.000000
3 0.000000
--
--
0 0.500000
--
0 1 foo foo 0.000000
1 0.000000
--
Throws:
See also: HfstOutputStream.write
Write the transducer in AT&T format to file named filename.
write_weights
defines whether weights are written.
If the file exists, it is overwritten. If the file does not exist, it is created.
Make priority union of this transducer with another
.
For the operation t1.priority_union(t2), the result is a union of t1 and t2, except that whenever t1 and t2 have the same string on left side, the path in t2 overrides the path in t1.
Example
Transducer 1 (t1):
a : a
b : b
Transducer 2 (t2):
b : B
c : C
Result ( t1.priority_union(t2) ):
a : a
b : B
c : C
For more information, read fsmbook.
Make cross product of this transducer with another.
It pairs every string of this with every string of another
.
If strings are not the same length, epsilon padding will be added in the end of the shorter string.
Precondition: Both transducers must be automata, i.e. map strings onto themselves.
Shuffle this transducer with transducer another
.
If transducer A accepts string 'foo' and transducer B string 'bar', the transducer that results from shuffling A and B accepts all strings [(f|b)(o|a)(o|r)].
Precondition: Both transducers must be automata, i.e. map strings onto themselves.
Freely insert a transition or a transducer into the transducer.
-
ins
The transition or transducer to be inserted.
If ins
is a transition, i.e. a 2-tuple of strings: A transition is added to each state in this transducer.
The transition leads from that state to itself with input and output symbols defined by ins
.
The weight of the transition is zero.
If ins
is an HfstTransducer:
A copy of ins
is attached with epsilon transitions
to each state of this transducer. After the operation, for each
state S in this transducer, there is an epsilon transition
that leads from state S to the initial state of ins,
and for each final state of ins,
there is an epsilon transition
that leads from that final state to state S in this transducer.
The weights of the final states in ins
are copied to the
epsilon transitions leading to state S.
Set the weights of all final states to weight
.
If the HfstTransducer is of unweighted type (SFST_TYPE or FOMA_TYPE), nothing is done.
Push weights towards initial state.
If the HfstTransducer is of unweighted type (SFST_TYPE or FOMA_TYPE), nothing is done.
An example:
>>> import hfst
>>> tr = hfst.regex('[a::1 a:b::0.3 (b::0)]::0.7;')
>>> tr.push_weights_to_start()
>>> print(tr)
0 1 a a 2.000000
1 2 a b 0.000000
2 3 b b 0.000000
2 0.000000
3 0.000000
Push weights towards final state(s).
If the HfstTransducer is of unweighted type (SFST_TYPE or FOMA_TYPE), nothing is done.
An example:
>>> import hfst
>>> tr = hfst.regex('[a::1 a:b::0.3 (b::0)]::0.7;')
>>> tr.push_weights_to_end()
>>> print(tr)
0 1 a a 0.000000
1 2 a b 0.000000
2 3 b b 0.000000
2 2.000000
3 2.000000
See also: push_weights_to_start
Substitute symbols or transitions in the transducer.
-
s
The symbol or transition to be substituted. Can also be a dictionary of substitutions, if S == None. -
S
The symbol, transition, a tuple of transitions or a transducer (hfst.HfstTransducer) that substitutess
. -
kwargs
Arguments recognized are 'input' and 'output', their values can be False or True, True being the default. These arguments are valid only ifs
andS
are strings, else they are ignored. -
input
Whether substitution is performed on input side, defaults to True. Valid only ifs
andS
are strings. -
output
Whether substitution is performed on output side, defaults to True. Valid only ifs
and \ S are strings.
For more information, see HfstBasicTransducer.substitute. The function works similarly, with the exception of argument S,
which must be an HfstTransducer instead of HfstBasicTransducer.
See also: HfstBasicTransducer.substitute
Optimize the transducer for lookup. This effectively converts the transducer into HFST_OL_TYPE.
Lookup string input
.
-
input
The input. A string or a pre-tokenized tuple of symbols (i.e. a tuple of strings). -
kwargs
Possible parameters and their default values are: obey_flags=True, max_number=-1, time_cutoff=0.0, output='tuple' -
obey_flags
Whether flag diacritics are obeyed. Always True for HFST_OL(W)_TYPE transducers. -
max_number
Maximum number of results returned, defaults to -1, i.e. infinity. -
time_cutoff
How long the function can search for results before returning, expressed in seconds. Defaults to 0.0, i.e. infinitely. Always 0.0 for transducers that are not of HFST_OL(W)_TYPE. -
output
Possible values are 'tuple', 'text' and 'raw', 'tuple' being the default.
Note: This function has an efficient implementation only for optimized lookup format HFST_OL_TYPE or HFST_OLW_TYPE). Other formats perform the lookup via composition. Consider converting the transducer to optimized lookup format or to a HfstBasicTransducer. Conversion to HFST_OL(W)_TYPE might take a while but the lookup is fast. Conversion to HfstBasicTransducer is quick but lookup is slower.
Remove lookup optimization. This effectively converts transducer (back) into default fst type.
Extract paths that are recognized by the transducer.
-
kwargs
Arguments recognized are filter_flags, max_cycles, max_number, obey_flags, output, random. -
filter_flags
Whether flags diacritics are filtered out from the result (default True). -
max_cycles
Indicates how many times a cycle will be followed, with negative numbers indicating unlimited (default -1 i.e. unlimited). -
max_number
The total number of resulting strings is capped at this value, with 0 or negative indicating unlimited (default -1 i.e. unlimited). -
obey_flags
Whether flag diacritics are validated (default True). -
output
Output format. Values recognized: 'text', 'raw', 'dict' (the default). 'text' returns a string where paths are separated by newlines and each path is represented as input_string + ":" + output_string + "\t" t weight. 'raw' yields a tuple of all paths where each path is a 2-tuple consisting of a weight and a tuple of all transition symbol pairs, each symbol pair being a 2-tuple of an input and an output symbol. 'dict' gives a dictionary that maps each input string into a list of possible outputs, each output being a 2-tuple of an output string and a weight. -
random
Whether result strings are fetched randomly (default False). Return: The extracted strings.output
controls how they are represented.
Precondition: The transducer must be acyclic, if both max_number
and max_cycles
have unlimited values. Else a hfst.exceptions.TransducerIsCyclicException will be thrown.
An example:
>>> tr = hfst.regex('a:b+ (a:c+)')
>>> print(tr)
0 1 a b 0.000000
1 1 a b 0.000000
1 2 a c 0.000000
1 0.000000
2 2 a c 0.000000
2 0.000000
>>> print(tr.extract_paths(max_cycles=1, output='text'))
a:b 0
aa:bb 0
aaa:bbc 0
aaaa:bbcc 0
aa:bc 0
aaa:bcc 0
>>> print(tr.extract_paths(max_number=4, output='text'))
a:b 0
aa:bc 0
aaa:bcc 0
aaaa:bccc 0
>>> print(tr.extract_paths(max_cycles=1, max_number=4, output='text'))
a:b 0
aa:bb 0
aa:bc 0
aaa:bcc 0
Throws:
See also: HfstTransducer.n_best
Note: Special symbols are printed as such.
Todo: a link to flag diacritics
Extract shortest paths of the transducer. Return: A dictionary.
Extract longest paths of the transducer. Return: A dictionary.
Get length of longest path of the transducer.
Lookup or apply a single tokenized string tok_input
and return a maximum of limit
results.
TODO: This is a version of lookup that handles flag diacritics as ordinary symbols and does not validate the sequences prior to outputting. Currently, this function calls lookup_fd.
Todo: Handle flag diacritics as ordinary symbols instead of calling lookup_fd.
See also: lookup_fd
Return: HfstOneLevelPaths pointer
-
tok_input
A tuple of consecutive symbols (strings). -
limit
Number of strings to look up. -1 tries to look up all and may get stuck if infinitely ambiguous.
Lookup or apply a single string input
and return a maximum of limit
results.
This is an overloaded lookup function that leaves tokenizing to the transducer. Return: HfstOneLevelPaths pointer
Lookup or apply a single string input
and return a maximum of limit
results. tok
defined how s
is tokenized.
This is an overloaded lookup function that leaves tokenizing to tok
.
Return: HfstOneLevelPaths pointer
Lookup or apply a single string tok_input
minding flag diacritics properly and return a maximum of limit
results.
Traverse all paths on logical first level of the transducer to produce all possible outputs on the second. This is in effect a fast composition of single path from left hand side.
This is a version of lookup that handles flag diacritics as epsilons and validates the sequences prior to outputting. Epsilons on the second level are represented by empty strings in the results.
Precondition: The transducer must be of type HFST_OL_TYPE or HFST_OLW_TYPE. This function is not implemented for other transducer types.
-
tok_input
A tuple of consecutive symbols (strings) to look up. -
limit
(Currently ignored.) Number of strings to look up. -1 tries to look up all and may get stuck if infinitely ambiguous.
See also: is_lookup_infinitely_ambiguous Return: HfstOneLevelPaths pointer
Todo: Do not ignore argument limit
.
Lookup or apply a single string s
minding flag diacritics properly and return a maximum of limit
results.
This is an overloaded lookup function that leaves tokenizing to the transducer. Return: HfstOneLevelPaths pointer
Lookup or apply a single string input
minding flag diacritics properly and return a maximum of limit
results. tok
defines how s is tokenized.
This is an overloaded lookup function that leaves tokenizing to tok
.
Return: HfstOneLevelPaths pointer
Whether lookup of path input
will have infinite results.
Currently, this function will return whether the transducer
is infinitely ambiguous on any lookup path found in the transducer,
i.e. the argument input
is ignored.
Todo: Do not ignore the argument input
Whether the transducer is infinitely ambiguous.
A transducer is infinitely ambiguous if there exists an input that will yield infinitely many results, i.e. there are input epsilon loops that are traversed with that input.
Whether the transducer has flag diacritics in its transitions.
Package hfst
- AttReader
- PrologReader
- HfstBasicTransducer
- HfstBasicTransition
- HfstTransducer
- HfstInputStream
- HfstOutputStream
- MultiCharSymbolTrie
- HfstTokenizer
- LexcCompiler
- XreCompiler
- PmatchContainer
- ImplementationType