My implementation for "Hierarchical Attention Networks for Document Classification" (Yang et al. 2016)
Data from Yelp is downloaded from Duyu Tang's homepage (the same dataset used in Yang's paper)
Download link: http://ir.hit.edu.cn/~dytang/paper/emnlp2015/emnlp-2015-data.7z
- Put data in a directory named "data/yelp_YEAR/" (where "YEAR" is the year)
- Run "yelp-preprocess.ipynb" to preprocess the data. The format becomes "label \t\t sentence1 \t sentence2...".
- Then run "word2vec.ipynb" to train word2vec model from training set.
- Run "HAN.ipynb" to train the model.
- Run "case_study.ipynb" to run visualization of some examples from validation set, including attention vector(sentence-level and word-level) and the prediction results.
Now we get about 65% accuracy on the yelp2013 test set. After fine-tuning hyperparameters, it can be better.
epoches | batch size | GRU units | word2vec size | optimizer | learning rate | maximum sentence length |
---|---|---|---|---|---|---|
50 | 32 | 128 | 50 | Adam | 4e-4 | 200 |
train | validation | test |
---|---|---|
66.891% | 64.659% | 64.734% |