Freely available datasets for Linguistics and NLP tasks.
This repository contains some freely available datasets to be used in Linguistics investigation and NLP tasks. Some of the datasets were personally gathered for specif purposes, such as hate speech, subjectivity in news, or irony detection. External datasets are part of different projects and are shared by the researchers. They are hosted on the Edinburgh DataShare site (https://datashare.ed.ac.uk). Each dataset is shared preserving the original format.
If you use some of the datasets, please read the Reference document to cite properly.