-
Notifications
You must be signed in to change notification settings - Fork 1
Home
This is a preliminary version of a wiki for documenting HFST python interface. Currently the documentation is hosted at our Github web pages.
HFST - The Helsinki Finite-State Transducer technology is intended for creating and manipulating weighted or unweighted synchronic transducers implementing regular relations. UTF-8 is chosen as the character encoding used in HFST software. Currently, HFST has been implemented using the SFST, OpenFst and foma software libraries. Other versions may be added in some future release. SFST and foma implementations are unweighted and OpenFst implementation is weighted.
Part of HFST interface has been implemented for HFST's own two transducer formats, HfstBasicTransducer and optimized lookup format. The previous is useful for accessing individual states and transitions of a transducer, converting between transducer formats and storing transducers in an implementation-independent format. The latter is used for fast lookup of strings in a transducer.
All back-end implementations - SFST, OpenFst and foma - work according to the same interface, so it is possible to compile the same piece of code using different back-end libraries. There are some differences related to weights, as only OpenFst supports them.
HFST is written in C++, but there is a Python interface available which is documented on these pages. The Python API is basically a wrapper around the C++ API with some additional code and modifications. The C++ API has developed around the HFST command line tools, but the Python version is intended to be used as such and has been designed to be more user-friendly.
For a quick start to the HFST interface with examples, see here
The examples given in this documentation use Xerox transducer notation
Features
- Create transducers and apply hfst.HfstTransducer operations on them
- Create transducers hfst.HfstBasicTransducer from scratch
- hfst.HfstBasicTransducer.states Iterate through a transducer's states and transitions
- Create transducers by hfst.HfstTokenizer tokenizing UTF-8 strings with multicharacter symbols
- Apply hfst.xerox_rules.replace replace, hfst.sfst_rules.two_level_if two-level, restriction and coercion rules
Tutorial
- A quick start to the HFST interface with examples
- Using HFST with SFST, OpenFst and foma
Download
- Download and install HFST Python API
Links
Package hfst
- AttReader
- PrologReader
- HfstBasicTransducer
- HfstBasicTransition
- HfstTransducer
- HfstInputStream
- HfstOutputStream
- MultiCharSymbolTrie
- HfstTokenizer
- LexcCompiler
- XreCompiler
- PmatchContainer
- ImplementationType