This is the repo of code and data for the EMNLP 2021 paper "The Causal Direction of Data Collection Matters: Implications of Causal and Anticausal Learning for NLP" by Zhijing Jin*, Julius von Kügelgen*, Jingwei Ni, Tejas Vaidhya, Ayush Kaushal, Mrinmaya Sachan, Bernhard Schoelkopf.
To cite the paper, use the BibTex below:
@inproceedings{jin2021causal,
author = {Zhijing Jin
and Julius von Kuegelgen
and Jingwei Ni
and Tejas Vaidhya
and Ayush Kaushal
and Mrinmaya Sachan
and Bernhard Schoelkopf},
title = {Causal Direction of Data Collection Matters: {I}mplications of Causal and Anticausal Learning for {NLP}},
booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural
Language Processing, {EMNLP} 2021},
publisher = {Association for Computational Linguistics},
year = {2021},
url = {https://arxiv.org/abs/2110.03618},
}
- Code for MDL (paper Section 4.2):
mdl/
with data in this zip file (24.3MB) - Code for deciperment experiments (paper Section 5.1):
decipher/
- Code for meta study significance tests (paper Section 5.2 & 6):
meta_study_significance_test.py
with annotated data in this Google Spreadsheet