This repository hold 2 pickle dictionaries (Python) containing labeled data for cross source cross domain sentiment analysis. The two files are related either to English texts or Italian written ones.
The Dataset_ENG is composed by:
-
Amazon: it contains a sample of 75,000 reviews of different Amazon products (as lectronic devices, kitchen objects, clothes and house accessories) collected from January to February 2018 and written in English. Each review is accompanied by the its date, the short title and the sentiment (expressed in a 5-stars rating) defined by the user who wrote the review. For privacy issues, the user name is omitted.
-
Tripadvisor: it contains a sample of 75,000 reviews English reviews about hotels, restaurants, cities downloaded from Tripadvisor.com between January and February 2018. Each review is accompanied by the its date, the short title and the sentiment (expressed in a 5-stars rating) defined by the user who wrote the review. For privacy issues, the user name is omitted.
-
Facebook: it contains 5,782 English Facebook posts. The post are related only to specific public pages having a 5-start rating system. The sampled reviews performed from January to February 2018 are about several topics, namely universities, events, famous people, locals, parties, shops and cities. Each item in the collection is accompanied by the sentiment (expressed in a 5-stars rating) defined by the user. For privacy issues, the user name is omitted.
The Dataset_ITA is composed by:
-
Amazon: it contains a sample of 75,000 reviews of different Amazon products (as lectronic devices, kitchen objects, clothes and house accessories) collected from January to February 2018 and written in Italian. Each review is accompanied by the its date, the short title and the sentiment (expressed in a 5-stars rating) defined by the user who wrote the review. For privacy issues, the user name is omitted.
-
Tripadvisor: it contains a sample of 75,000 reviews reviews written in Italian about hotels, restaurants, cities downloaded from Tripadvisor.com between January and February 2018. Each review is accompanied by the its date, the short title and the sentiment (expressed in a 5-stars rating) defined by the user who wrote the review. For privacy issues, the user name is omitted.
-
Facebook: it contains 1,077 Italian Facebook posts. The post are related only to specific public pages having a 5-start rating system. The sampled reviews performed from January to February 2018 are about several topics, namely universities, events, famous people, locals, parties, shops and cities. Each item in the collection is accompanied by the sentiment (expressed in a 5-stars rating) defined by the user. For privacy issues, the user name is omitted.
-
Twitter: sample of 937 Italian tweets manually labeled. The sample was collected at April 2018 and it regards Italian television shows and other more general topics. Each review has a three class sentiment label among negative, neutral or positive.
If you use these datasets, please cite:
Zola, P., Cortez, P., Ragno, C., & Brentari, E. (2019). Social Media Cross-Source and Cross-Domain Sentiment Classification. International Journal of Information Technology & Decision Making.
Thank you!