This is the Pytorch implementation for our paper Improving Cross-Lingual Sentiment Analysis via Conditional Language Adversarial Training of Neural Networks.
This work is accepted to the 3rd Workshop on Research in Computational Typology and Multilingual NLP (SIGTYP) at NAACL 2021.
Sentiment analysis has come a long way for high-resource languages due to the availability of large annotated corpora. However, it still suffers from lack of training data for low-resource languages. To tackle this problem, we propose Conditional Language Adversarial Network (CLAN), an end-to-end neural architecture for cross-lingual sentiment analysis without cross-lingual supervision. CLAN differs from prior work in that it allows the adversarial training to be conditioned on both learned features and the sentiment prediction, to increase discriminativity for learned representation in the cross-lingual setting. Experimental results demonstrate that CLAN outperforms previous methods on the multilingual multi-domain Amazon review dataset.
- Pytorch v1.4.0
- PyYAML v5.3.1
- NumPy v1.15.2
- Mecab v0.996.3 (Japanese tokenization)
- NLTK v3.4.5 (English / French / German tokenization)
Download the Amazon review dataset:
git clone https://github.com/hemanthkandula/Conditional-Language-Adversaral-Networks.git
cd Conditional-Language-Adversaral-Networks
wget -P data/ https://zenodo.org/record/3251672/files/cls-acl10-unprocessed.tar.gz
tar xvf data/cls-acl10-unprocessed.tar.gz -C data/
Then run the following script to preprocess data:
python helper_utils/pre_process.py
Using all language data
python train_clan_id.py --gpu_id 0 --sup_dom music --seed 0
Adapting specific languages
python train_clan_id.py --gpu_id 0 --source_lang en --target_lang ja --seed 0
python train_clan_cd.py --gpu_id 0 --source_lang en --target_lang de --source_domain dvd --target_domain music --seed 0
@inproceedings{kandula2021improving,
title={Improving Cross-Lingual Sentiment Analysis via Conditional Language Adversarial Nets},
author={Kandula, Hemanth and Min, Bonan},
booktitle={Proceedings of the Third Workshop on Computational Typology and Multilingual NLP},
pages={32--37},
year={2021}
}