Skip to content
eaxelson edited this page Aug 29, 2017 · 8 revisions

This is a preliminary version of a wiki for documenting HFST python interface. Currently the documentation is hosted at our Github web pages.

Package 'hfst'

HFST - The Helsinki Finite-State Transducer technology is intended for creating and manipulating weighted or unweighted synchronic transducers implementing regular relations. UTF-8 is chosen as the character encoding used in HFST software. Currently, HFST has been implemented using the SFST, OpenFst and foma software libraries. Other versions may be added in some future release. SFST and foma implementations are unweighted and OpenFst implementation is weighted.

Part of HFST interface has been implemented for HFST's own two transducer formats, HfstBasicTransducer and optimized lookup format. The previous is useful for accessing individual states and transitions of a transducer, converting between transducer formats and storing transducers in an implementation-independent format. The latter is used for fast lookup of strings in a transducer.

All back-end implementations - SFST, OpenFst and foma - work according to the same interface, so it is possible to compile the same piece of code using different back-end libraries. There are some differences related to weights, as only OpenFst supports them.

HFST is written in C++, but there is a Python interface available which is documented on these pages. The Python API is basically a wrapper around the C++ API with some additional code and modifications. The C++ API has developed around the HFST command line tools, but the Python version is intended to be used as such and has been designed to be more user-friendly.

For a quick start to the HFST interface with examples, see here

The examples given in this documentation use Xerox transducer notation

Features

Tutorial

Download

Links