This is an implementation of the experiments and combination system presented in:
- Kelly Marchisio, Neha Verma, Kevin Duh, and Philipp Koehn. 2022. IsoVec: Controlling the Relative Isomorphism of Word Embedding Spaces. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6019–6033, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
If you use this software for academic research, please cite the paper above.
- python3
- pytorch
- sklearn
- scipy
- numpy
- indic-nlp-library
- torchtext
- Download third party packages:
cd third_party && sh get_third_party.sh && cd ..
- Note: If you're on Mac with an M1 chip, word2vec might not build. You can fix this by changing -march=native to -mcpu=apple-m1 in word2vec's makefile, and subbing in getc_unlocked and putc_unlocked for fgetc_unlocked/fputc_unlocked. You'll also need to use gshuf instead of shuf within src/train.py.
- Download and make data:
cd data && sh make_data.sh
- Download and make train/dev/test dictionaries:
cd data/dicts && sh create_dicts.sh
To reproduce Table 1 in the paper (Baselines), run:
sh baseline.sh $system $lang $seed
- For instance, run
sh baseline.sh w2v uk
for offical word2vec trained on Ukrainian. - system choices: {isovec, w2v}
- lang choices: {uk, bn, ta, en}
- For instance, run
- After you train English and Ukrainian baseline w2v spaces, for instance, you
can map them and evaluate the dictionary precision with:
sh map-and-eval.sh baseline w2v uk en dev
- Results will be in
exps/baseline/w2v/uk-en/*out
- Results will be in
To run IsoVec in reference to a fixed embedding space (main experiments):
- Example Goal: Train a Ukrainian embedding space with RSIM-U, in reference to a fixed English space.
- Step 1: Train the fixed English space with
sh baseline.sh isovec en
- Step 2: Train the Ukrainian space with:
sh run-isovec.sh rsim-u uk en
- Choices of Isovec training algorithm are
l2, proc-l2, proc-l2-init, rsim, rsim-init, rsim-u, evs-u
for L2, Proc-L2, Proc-L2+Init, RSIM, RSIM-U, and EVS-U as detailed in Section 4.3 and 4.4 of the paper.
- Choices of Isovec training algorithm are
- Step 3: Map & Evaluate the spaces with:
sh map-and-eval.sh isovec rsim-u uk en dev