This project provides tools for transliterating Chechen text from Cyrillic script to Latin script.
- convert_json_to_tsv.py: Script to convert a JSON text corpus to a TSV wordlist.
- corpora_texts.json: JSON file containing the text corpus.
- corpora_wordlist.tsv: TSV file containing the word list.
- cyrl_latn_dictionary.json: JSON file with the Cyrillic to Latin transliteration dictionary.
- docker-compose.yml: Docker Compose configuration file.
- Dockerfile: Dockerfile to build the Docker image.
- example.env: Example environment variable configuration file.
- interactive_transliterate.py: Script for interactive transliteration.
- requirements.txt: List of Python dependencies.
- telegram_bot.py: Script for the Telegram bot.
- transliterate.py: Transliteration library module.
- transliterate_tsv.py: Script to transliterate words in a TSV file.
To convert the JSON text corpus to a TSV wordlist, run:
python convert_json_to_tsv.py
To transliterate words in a TSV file, run:
python transliterate_tsv.py
To run the interactive transliteration script, run:
python interactive_transliterate.py
To run the Telegram bot, ensure your environment variables are set correctly in .env
, and run:
python telegram_bot.py
- Install dependencies:
pip install -r requirements.txt
- Run the scripts as needed.
-
Set up your environment variables in
.env
. -
Build and run the project using Docker Compose:
docker compose up -d