- Tested with Python 3.7.5
Step1. Install pytorch follow this link: https://pytorch.org/get-started/locally/
# Recent Pytorch installation script
# GPU version
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
# Training on CPU is not recommended. It's too slow!
Step2. Install all requirements
pip install -r requirements.txt
python scripts/train.py
Evaluate the last trained model
python scripts/evaluate.py
Edit the scripts/generate.py to pass input and output filepath.
python scripts/generate.py
# evaluate TurkCorpus and ASSET datasets
python scripts/evaluate_4tokens_pretrained_model.py
Used tokens: C: 0.95 L: 0.75 WR: 0.75 DTD: 0.75
Expected results:
TurkCorpus: SARI: 43.50 BLEU: 67.26 FKGL: 6.17
ASSET: SARI: 45.05 BLEU: 72.24 FKGL: 6.33
# edit the file to change input and output files. Default TurkCorpus complex is used as an example.
python scripts/generate_with_4tokens_pretrained_model.py
# evaluate TurkCorpus and ASSET datasets
python scripts/evaluate_5tokens_pretrained_model.py
Used tokens: W: 1.05 C: 0.95 L: 0.75 WR: 0.75 DTD: 0.75
Expected results:
TurkCorpus: SARI: 43.34 BLEU: 65.54 FKGL: 5.55
ASSET: SARI: 45.26 BLEU: 72.19 FKGL: 5.92
# edit the file to change input and output files. Default TurkCorpus complex is used as an example.
python scripts/generate_with_5tokens_pretrained_model.py
If you make use of the code in this repository, please cite the following papers:
@inproceedings{sheang-saggion-2021-controllable,
title = "Controllable Sentence Simplification with a Unified Text-to-Text Transfer Transformer",
author = "Sheang, Kim Cheng and Saggion, Horacio",
booktitle = "Proceedings of the 14th International Conference on Natural Language Generation",
month = aug,
year = "2021",
address = "Aberdeen, Scotland, UK",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.inlg-1.38",
pages = "341--352"
}
The preprocessing code was adopted from https://github.com/facebookresearch/access