Skip to content

Latest commit

 

History

History
43 lines (31 loc) · 1.34 KB

README.md

File metadata and controls

43 lines (31 loc) · 1.34 KB

Character-level Text Classification

Implementation of character-level deep neural networks for text classification. Three models (CNN, VDCNN and GRU) are evaluated on four binary text classification datasets (Blog Authorship Corpus, PAN13 and PAN14 and Enron Email Dataset). Results:

Blogs PAN13 PAN14 Enron
CNN 65% 55% 69% 57%
VDCNN 66% 74% 67% 64%
GRU 62% 60% 63% 62%

Overall, the VDCNN model is the most accurate, but the GRU model displays more consistent results.

Installation

A working Python 3 installation is assumed. Install the required packages using:

pip install -r requirements.txt

Note that requirements.txt references the tensorflow-gpu package. It is recommended to use a GPU to train the models. If no GPU is used, install the tensorflow package instead.

Usage

Download the training data using:

./download.sh

Run the preprocessing steps using:

./process.sh

Now, you can train a model using:

./train.py -a vdcnn -d blogs pan13_tr_en

Use train.py -h for more information.