This is the repository for the paper Conception: Multilingually-Enhanced, Human-Readable Concept Vector Representations, presented at COLING 2020 by Simone Conia and Roberto Navigli.
To date, the most successful word, word sense, and concept modelling techniques have used large corpora and knowledge resources to produce dense vector representations that capture semantic similarities in a relatively low-dimensional space. Most current approaches, however, suffer from a monolingual bias, with their strength depending on the amount of data available across languages. In this paper we address this issue and propose Conception, a novel technique for building language-independent vector representations of concepts which places multilinguality at its core while retaining explicit relationships between concepts. Our approach results in high-coverage representations that outperform the state of the art in multilingual and cross-lingual Semantic Word Similarity and Word Sense Disambiguation, proving particularly robust on lowresource languages.
You can download a copy of all the files in this repository by cloning the git repository:
git clone https://github.com/SapienzaNLP/conception.git
If you want to recreate or use the vectors, you will need the BabelNet APIs. You can download all you need from the official BabelNet website. We highly recommend to also download the BabelNet indices (they are free to use for research purposes) to speed up the process.
@inproceedings{conia-and-navigli-2020-conception,
title = {{C}onception: {M}ultilingually-Enhanced, Human-Readable Concept Vector Representations},
author = {Conia, Simone and Navigli, Roberto},
booktitle = {Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020},
year = {2020}
}