This repository contains the dataset for the SmokEng dataset for Twitter Tobacco-related Classification and experiments used in "SmokPro: Towards Tobacco Product Identification in Social Media Text".
Label Mapping
- -1: Narcotic Mentions
- 0: Ambivalent Mentions
- 4: General Tobacco Mentions
- 5: Traditional Tobacco Product Mentions
- 6: Modern Tobacco Product Mentions
Each row contains tweet text along with its corresponding label. Please note that we have removed Hashtags and Mentions in order to preserve the privacy.