MergedQUAD consists of splits for SQUAD-based Question-Answering in Hindi language. It is a combination of examples taken from other multilingual SQUAD-based Question Answering datasets like XQUAD and TyDiQA. This dataset was introduced in our paper titled "Indic-Transformers: An Analysis of Transformer Language Models for Indian Languages" which has been accepted as a workshop paper at ML-RSA (NeurIPS 2020). This paper presents an exhaustive study of transformer-based architectures on Indian languages like Hindi, Bengali and Telugu. You can find our models on HuggingFace model hub over here.
If you use this work, please cite
@misc{jain2020indictransformers,
title={Indic-Transformers: An Analysis of Transformer Language Models for Indian Languages},
author={Kushal Jain and Adwait Deshpande and Kumar Shridhar and Felix Laumann and Ayushman Dash},
year={2020},
eprint={2011.02323},
archivePrefix={arXiv},
primaryClass={cs.CL}
}