XLM: Cross-lingual Language Model Pretraining

An implementation of Cross-lingual Language Model Pretraining (XLM) using pytorch. You can choose following three training models.

Settings

This code are depend on the following.

git clone https://github.com/t080/pytorch-xlm.git
cd ./pytorch-xlm
pip install -r requirements.txt

When a causal language model or a masked language model are trained, you must give a monolingual corpus (.txt) to the --train option.

python train.py \
  --task causal (or masked) \
  --train /path/to/train.txt \
  --savedir ./checkpoints \
  --gpu

When a translation language model is trained, you must give a parallel corpus (.tsv) to the --train option.

python train.py \
  --task translation \
  --train /path/to/train.tsv \
  --savedir ./checkpoints \
  --gpu

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
models		models
README.md		README.md
options.py		options.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py