Meta-Curriculum

Meta-Curriculum Learning for Domain Adaptation in Neural Machine Translation (AAAI 2021)

Update: There are some problems with the OPUS corpus in terms of data quality. To make a fair comparison, we would like to suggest taking the results reported in a subsequent work (Table 2) as the reference. They reproduced the experiments on a cleaner benchmark and further improved the performance. We appreciate their effort in checking the results.

Citation

Please cite as:

@inproceedings{zhan2021metacl,
  title={Meta-Curriculum Learning for Domain Adaptation in Neural Machine Translation},
  author={Zhan, Runzhe and Liu, Xuebo and Wong, Derek F. and Chao, Lidia S.},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={35},
  number={16},
  month={May}, 
  year={2021},
  pages={14310-14318}
}

Requirements and Installation

This implementation is based on fairseq(v0.6.2) and partial code from Sharaf, Hassan, and Daume III (2020).

PyTorch version >= 1.2.0
Python version >= 3.6
CUDA & cudatoolkit >= 9.0

git clone https://github.com/NLP2CT/Meta-Curriculum
cd Meta-Curriculum
pip install --editable .

Pipeline

Train a baseline model following the SOP in examples/translation/README.md. See our script general_train.sh (also utilize it for baseline finetuning).
Use the scripts containing in the folder lm_score/general_domain_script to train a general domain NLM.
Finetune the domain-specific NLM following the script lm_score/finetune_lm/continue_lm_domain.sh.
Score the adaptation divergence for domain corpus:

CUDA_VISIBLE_DEVICES=0 python lm_score/finetune_lm/score.py --general-lm GENERAL DOMAIN NLM PATH --domain-lms DOMAIN NLMs PATH --bpe-code BPE CODE --data-path DOMAIN CORPUS PATH --domains [DOMAIN1, DOMAIN2, ...]

Please note that you may separately run the LM training/scoring with higher version fairseq (>=0.9.0) due to the API changes.

Prepare meta-learning data set using meta_data_prep.py.

python meta_data_prep.py --data-path DOMAIN_DATA_PATH --split-dir META_SPLIT_SAVE_DIR
                              --spm-model SPM_MODEL_PATH --k-support N --k-query N
                              --meta_train_task N --meta_test_task N
                              --unseen-domains [UNSEEN_DOMAINS ...] 
                              --seen-domains [SEEN_DOMAINS ...]

(Meta-Train) Train meta-learning model with curriculum using the script meta_train_ccl.sh.
Score the unseen domains using the script score_unseens.sh.
(Meta-Test) Finetune meta-trained model with curriculum using the script cl_finetune.sh.

🌟 COVID-19 English-German Small-Scale Parallel Corpus

See covid19-ende/covid_de.txt and covid19-ende/covid_en.txt (Raw data without preprocessing).

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
covid19-ende		covid19-ende
docs		docs
examples		examples
fairseq		fairseq
fairseq_cli		fairseq_cli
lm_score		lm_score
scripts		scripts
tests		tests
LICENSE		LICENSE
PATENTS		PATENTS
README.md		README.md
cl_finetune.py		cl_finetune.py
cl_finetune.sh		cl_finetune.sh
dev_limit.py		dev_limit.py
eval_lm.py		eval_lm.py
general_train.sh		general_train.sh
generate.py		generate.py
interactive.py		interactive.py
meta_curriculum_train.py		meta_curriculum_train.py
meta_data_prep.py		meta_data_prep.py
meta_train_ccl.sh		meta_train_ccl.sh
preprocess.py		preprocess.py
score.py		score.py
score_unseens.sh		score_unseens.sh
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Meta-Curriculum

Citation

Requirements and Installation

Pipeline

🌟 COVID-19 English-German Small-Scale Parallel Corpus

About

Releases

Packages

Contributors 2

Languages

License

NLP2CT/Meta-Curriculum

Folders and files

Latest commit

History

Repository files navigation

Meta-Curriculum

Citation

Requirements and Installation

Pipeline

🌟 COVID-19 English-German Small-Scale Parallel Corpus

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages