Skip to content

UsingHfstWithOpenfstFomaAndSfst

eaxelson edited this page Aug 30, 2017 · 3 revisions

Using HFST with SFST, OpenFst and foma

A binary HfstTransducer consists of an HFST header (more on HFST headers on our Kitwiki pages and the transducer of the backend implementation. If you want to write backend transducers as such, you can specify it with the hfst_format keyword argument of HfstOutputStream constructor:

HfstOutputStream(hfst_format=True)

The following piece of code will write a native OpenFst transducer with tropical weights to standard output:

test.py:

import hfst
ab = hfst.regex('a:b::2.8')
out = hfst.HfstOutputStream(hfst_format=False)
out.write(ab)
out.flush()
out.close()

run on command line (fstprint is native OpenFst tool):

python test.py > ab.fst
fstprint ab.fst

output:

0       1       a       b       2.79980469
1

An hfst.HfstInputStream can also read backend transducers that do not have an HFST header. If we have the following files

symbols.txt:

EPSILON 0
a 1
b 2

ab.txt:

0 1 a b 0.5
1 0.3

test.py:

import hfst
istr = hfst.HfstInputStream()
while not istr.is_eof():
    tr = istr.read()
    print('Read transducer:')
    print(tr)
istr.close()

the commands

cat ab.txt | fstcompile --isymbols=symbols.txt --osymbols=symbols.txt --keep_isymbols --keep_osymbols | python test.py

will compile a native OpenFst transducer (fstcompile is a native OpenFst tool), read it with HFST tools and print it to standard output in AT&T text format:

Read transducer:
0       1       a       b       0.500000
1       0.300000

For more information on HFST transducer formats and conversions, see our Kitwiki pages.


An issue with foma

Foma writes its binary transducers in gzipped format using the gz tools. However, we experienced problems when trying to write to standard output or read from standard input with gz tools (foma tools do not write to or read from standard streams). So we choose to write, and accordingly read, foma transducers unzipped when writing or reading binary HfstTransducers of hfst.ImplementationType.FOMA_TYPE. As a result, when we write an HfstTransducer of FOMA_TYPE in its plain backend format, the user must zip it themselves before it can be used by foma tools. (update: at least the newest releases of foma are able to read also unzipped transducers.) Similarily, a foma transducer must be unzipped before it can be read by HFST tools.

Suppose we have written a FOMA_TYPE HfstTransducer and want to use it with foma tools. First we write it, in its plain backend format, to file 'ab.foma' with the following piece of code:

import hfst
hfst.set_default_fst_type(hfst.ImplementationType.FOMA_TYPE)
ab = libfst.regex('a:b')
out = hfst.HfstOutputStream(hfst_format=False)
out.write(ab)
out.flush()
out.close()

The command

gzip ab.foma

will create a file 'ab.foma.gz' that can be used by (older) foma tools.

An example of the opposite case follows. Suppose we have a foma transducer 'transducer.foma' and want to read it inside an HFST program. The name of the file must be appended a .gz extension so that the program 'gunzip' knows it is a zipped file. The commands

mv transducer.foma transducer.foma.gz
gunzip transducer.foma.gz

overwrite the original file 'transducer.foma' with an unzipped version of the same file. Now the file can be used by HFST:

instr = hfst.HfstInputStream('transducer.foma')
tr = instr.read()
instr.close()