A simple way to assess the effort involved in Post-Editing is to count the number of edits required to transform MT output into a publishable translation and vice versa. woerdle-zehla, which is Swabian for "word counter", takes the concept of edit-distance to a document level and creates simple, but intuitive reports for you. Review the results in the GUI directly or save them to disk for subsequent analysis and processing.
- The tool accepts bilingual SDLXLIFF files and Across HTML previews as inputs and extracts the following information:
- Metadata, including:
- source and target language IDs
- name of the client/relation, project name, filename (optional)
- String data, including:
- Source strings
- Target strings
- Strings from version history (optional in Across)
- If no previous MT output is available, woerdle-zehla will create a sample object from your source data and send it for translation with an API call. The default path to your API key is
data/API_key.txt
. Currently, only the DeepL API is supported. - The MT output is compared against the target translations and each segment is assigned an edit-distance score (aka Levenshtein distance). Individual scores are then combined to calculate a score at the document level (Post-edit density).
- woerdle-zehla offers you three ways for further analysis:
- A histogram view with discrete bins for segments within a certain score range. This is perfect to quickly assess the distribution of your scores (e.g. bell-shaped, logarithmic, flat) and its skewness.
- Text output for a limited number of segments from the low and high ends of the distribution. This is to give you an idea regarding segments that work well with MT and those that do not.
- JSON exports containing metadata, sample segments, aligned translations and scores. Use these files to map scores to project data in your PMIS or to calculate the potential impact of any post-processing steps you wish to introduce to your MT processes.
If you are using Anaconda as a package manager, no additional libraries are required.
python gui.py
This will start the GUI. Check out the settings options to manage additional aspects of the tool.
If you want to run the woerdle-zehla tool on a remote machine, use the accompanying Jupyter Notebook.
Feel free to drop me a line in case of any questions.