Implement some keyphrase generation algorithm
CopyRNN
Deep Keyphrase Generation (Meng et al., 2017)
CopyCNN
CopyTransformer
vocab_file: word line by line (don't with index!!!!)
this paper proposes
training, valid and test file
json line format, every line is a dict:
{'tokens': ['this', 'paper', 'proposes', 'using', 'virtual', 'reality', 'to', 'enhance', 'the', 'perception', 'of', 'actions', 'by', 'distant', 'users', 'on', 'a', 'shared', 'application', '.', 'here', ',', 'distance', 'may', 'refer', 'either', 'to', 'space', '(', 'e.g.', 'in', 'a', 'remote', 'synchronous', 'collaboration', ')', 'or', 'time', '(', 'e.g.', 'during', 'playback', 'of', 'recorded', 'actions', ')', '.', 'our', 'approach', 'consists', 'in', 'immersing', 'the', 'application', 'in', 'a', 'virtual', 'inhabited', '3d', 'space', 'and', 'mimicking', 'user', 'actions', 'by', 'animating', 'avatars', '.', 'we', 'illustrate', 'this', 'approach', 'with', 'two', 'applications', ',', 'the', 'one', 'for', 'remote', 'collaboration', 'on', 'a', 'shared', 'application', 'and', 'the', 'other', 'to', 'playback', 'recorded', 'sequences', 'of', 'user', 'actions', '.', 'we', 'suggest', 'this', 'could', 'be', 'a', 'low', 'cost', 'enhancement', 'for', 'telepresence', '.'] , 'keyphrases': [['telepresence'], ['animation'], ['avatars'], ['application', 'sharing'], ['collaborative', 'virtual', 'environments']]}
download the kp20k
mkdir data mkdir data/raw mkdir data/raw/kp20k_new # !! please unzip kp20k data put the files into above folder manually python -m nltk.downloader punkt bash scripts/prepare_kp20k.sh bash scripts/train_copyrnn_kp20k.sh # start tensorboard # enter the experiment result dir, suffix is time that experiment starts cd data/kp20k/copyrnn_kp20k_basic-20191212-080000 # start tensorboard services tenosrboard --bind_all --logdir logs --port 6006
- compared with the original
seq2seq-keyphrase-pytorch
- fix the implementation error:
- copy mechanism
- train and inference are not correspond (training doesn't have input feeding and inference has input feeding)
- easy data preparing
- tensorboard support
- faster beam search (6x faster used cpu and more than 10x faster used gpu)
- compared with the original