Character-level Text Classification

Implementation of character-level deep neural networks for text classification. Three models (CNN, VDCNN and GRU) are evaluated on four binary text classification datasets (Blog Authorship Corpus, PAN13 and PAN14 and Enron Email Dataset). Results:

	Blogs	PAN13	PAN14	Enron
CNN	65%	55%	69%	57%
VDCNN	66%	74%	67%	64%
GRU	62%	60%	63%	62%

Overall, the VDCNN model is the most accurate, but the GRU model displays more consistent results.

Installation

A working Python 3 installation is assumed. Install the required packages using:

pip install -r requirements.txt

Note that requirements.txt references the tensorflow-gpu package. It is recommended to use a GPU to train the models. If no GPU is used, install the tensorflow package instead.

Usage

Download the training data using:

./download.sh

Run the preprocessing steps using:

./process.sh

Now, you can train a model using:

./train.py -a vdcnn -d blogs pan13_tr_en

Use train.py -h for more information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Character-level Text Classification

Installation

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

Character-level Text Classification

Installation

Usage