Skip to content

Meta-Curriculum Learning for Domain Adaptation in Neural Machine Translation (AAAI 2021)

License

Notifications You must be signed in to change notification settings

NLP2CT/Meta-Curriculum

Repository files navigation

Meta-Curriculum

Meta-Curriculum Learning for Domain Adaptation in Neural Machine Translation (AAAI 2021)

Update: There are some problems with the OPUS corpus in terms of data quality. To make a fair comparison, we would like to suggest taking the results reported in a subsequent work (Table 2) as the reference. They reproduced the experiments on a cleaner benchmark and further improved the performance. We appreciate their effort in checking the results.

Citation

Please cite as:

@inproceedings{zhan2021metacl,
  title={Meta-Curriculum Learning for Domain Adaptation in Neural Machine Translation},
  author={Zhan, Runzhe and Liu, Xuebo and Wong, Derek F. and Chao, Lidia S.},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={35},
  number={16},
  month={May}, 
  year={2021},
  pages={14310-14318}
}

Requirements and Installation

This implementation is based on fairseq(v0.6.2) and partial code from Sharaf, Hassan, and Daume III (2020).

  • PyTorch version >= 1.2.0
  • Python version >= 3.6
  • CUDA & cudatoolkit >= 9.0
git clone https://github.com/NLP2CT/Meta-Curriculum
cd Meta-Curriculum
pip install --editable .

Pipeline

  1. Train a baseline model following the SOP in examples/translation/README.md. See our script general_train.sh (also utilize it for baseline finetuning).
  2. Use the scripts containing in the folder lm_score/general_domain_script to train a general domain NLM.
  3. Finetune the domain-specific NLM following the script lm_score/finetune_lm/continue_lm_domain.sh.
  4. Score the adaptation divergence for domain corpus:
CUDA_VISIBLE_DEVICES=0 python lm_score/finetune_lm/score.py --general-lm GENERAL DOMAIN NLM PATH --domain-lms DOMAIN NLMs PATH --bpe-code BPE CODE --data-path DOMAIN CORPUS PATH --domains [DOMAIN1, DOMAIN2, ...]

Please note that you may separately run the LM training/scoring with higher version fairseq (>=0.9.0) due to the API changes.

  1. Prepare meta-learning data set using meta_data_prep.py.
python meta_data_prep.py --data-path DOMAIN_DATA_PATH --split-dir META_SPLIT_SAVE_DIR
                              --spm-model SPM_MODEL_PATH --k-support N --k-query N
                              --meta_train_task N --meta_test_task N
                              --unseen-domains [UNSEEN_DOMAINS ...] 
                              --seen-domains [SEEN_DOMAINS ...]
  1. (Meta-Train) Train meta-learning model with curriculum using the script meta_train_ccl.sh.
  2. Score the unseen domains using the script score_unseens.sh.
  3. (Meta-Test) Finetune meta-trained model with curriculum using the script cl_finetune.sh.

🌟 COVID-19 English-German Small-Scale Parallel Corpus

See covid19-ende/covid_de.txt and covid19-ende/covid_en.txt (Raw data without preprocessing).

About

Meta-Curriculum Learning for Domain Adaptation in Neural Machine Translation (AAAI 2021)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages