This is a tool to convert TEI P5 dictionaries to slob format. Some free TEI P5 dictionaries are available at http://freedict.org/
Create Python 3 virtual environment and install slob.py as described at http://github.com/itkach/slob/.
In this virtual environment run
pip install git+https://github.com/itkach/tei2slob.git
Download a dictionary archive and unpack it. For example:
wget http://downloads.sourceforge.net/project/freedict/English%20-%20German/0.3.6/freedict-eng-deu-0.3.6.src.tar.bz2
tar -xvf freedict-eng-deu-0.3.6.src.tar.bz2
Then run converter:
tei2slob eng-deu/eng-deu.tei
eng-deu-0.3.6.slob
will be created in the same directory.
Converter attempts to populate dictionary tags based on information
in .tei
header section, but it may fail because the way some elements
(like license name) is not standardized and varies across
dictionaries, so be sure to check the tags:
slob info eng-deu-0.3.6.slob
Set tag values as necessary, for example:
slob tag -n license.name -v "GNU General Public License" eng-deu-0.3.6.slob
slob tag -n license.url -v "http://www.gnu.org/licenses/gpl.html" eng-deu-0.3.6.slob
slob tag -n created.by -v me@example.com eng-deu-0.3.6.slob
uri
is an important tag. When different dictionaries have the
same uri
it means they contain keys belonging to the same
logical dictionary. So when compiling a new version of existing
dictionary make sure uri remains the same.
usage: tei2slob [-h] [-o OUTPUT_FILE] [-c {lzma2,zlib}] [-b BIN_SIZE]
[-a CREATED_BY] [-w WORK_DIR]
input_file
positional arguments:
input_file TEI file name
optional arguments:
-h, --help show this help message and exit
-o OUTPUT_FILE, --output-file OUTPUT_FILE
Name of output slob file
-c {lzma2,zlib}, --compression {lzma2,zlib}
Name of compression to use. Default: zlib
-b BIN_SIZE, --bin-size BIN_SIZE
Minimum storage bin size in kilobytes. Default: 256
-a CREATED_BY, --created-by CREATED_BY
Value for created.by tag. Identifier (e.g. name or
email) for slob file creator
-w WORK_DIR, --work-dir WORK_DIR
Directory for temporary files created during
compilation. Default: .