deep-keyphrase

Implement some keyphrase generation algorithm

Description

Implemented Paper

CopyRNN

Deep Keyphrase Generation (Meng et al., 2017)

ToDo List

CopyCNN

CopyTransformer

Usage

required files (4 files in total)

vocab_file: word line by line (don't with index!!!!)
```
this
paper
proposes
```
training, valid and test file

data format for training, valid and test

json line format, every line is a dict:

{'tokens': ['this', 'paper', 'proposes', 'using', 'virtual', 'reality', 'to', 'enhance', 'the', 'perception', 'of', 'actions', 'by', 'distant', 'users', 'on', 'a', 'shared', 'application', '.', 'here', ',', 'distance', 'may', 'refer', 'either', 'to', 'space', '(', 'e.g.', 'in', 'a', 'remote', 'synchronous', 'collaboration', ')', 'or', 'time', '(', 'e.g.', 'during', 'playback', 'of', 'recorded', 'actions', ')', '.', 'our', 'approach', 'consists', 'in', 'immersing', 'the', 'application', 'in', 'a', 'virtual', 'inhabited', '3d', 'space', 'and', 'mimicking', 'user', 'actions', 'by', 'animating', 'avatars', '.', 'we', 'illustrate', 'this', 'approach', 'with', 'two', 'applications', ',', 'the', 'one', 'for', 'remote', 'collaboration', 'on', 'a', 'shared', 'application', 'and', 'the', 'other', 'to', 'playback', 'recorded', 'sequences', 'of', 'user', 'actions', '.', 'we', 'suggest', 'this', 'could', 'be', 'a', 'low', 'cost', 'enhancement', 'for', 'telepresence', '.'] ,
'keyphrases': [['telepresence'], ['animation'], ['avatars'], ['application', 'sharing'], ['collaborative', 'virtual', 'environments']]}

Training

download the kp20k

mkdir data
mkdir data/raw
mkdir data/raw/kp20k_new
# !! please unzip kp20k data put the files into above folder manually
python -m nltk.downloader punkt
bash scripts/prepare_kp20k.sh
bash scripts/train_copyrnn_kp20k.sh

# start tensorboard
# enter the experiment result dir, suffix is time that experiment starts
cd data/kp20k/copyrnn_kp20k_basic-20191212-080000
# start tensorboard services
tenosrboard --bind_all --logdir logs --port 6006

Notes

compared with the original seq2seq-keyphrase-pytorch
1. fix the implementation error:
  
  copy mechanism
  
  train and inference are not correspond (training doesn't have input feeding and inference has input feeding)
2. easy data preparing
3. tensorboard support
4. faster beam search (6x faster used cpu and more than 10x faster used gpu)

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
.github/workflows		.github/workflows
deep_keyphrase		deep_keyphrase
docs		docs
scripts		scripts
tests/test_utils		tests/test_utils
.coveragerc		.coveragerc
.gitignore		.gitignore
AUTHORS.rst		AUTHORS.rst
CHANGES.rst		CHANGES.rst
LICENSE.txt		LICENSE.txt
README.rst		README.rst
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
test-requirements.txt		test-requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

deep-keyphrase

Description

Implemented Paper

ToDo List

Usage

required files (4 files in total)

data format for training, valid and test

Training

Notes

About

Releases 1

Packages

Contributors 2

Languages

supercoderhawk/deep-keyphrase

Folders and files

Latest commit

History

Repository files navigation

deep-keyphrase

Description

Implemented Paper

ToDo List

Usage

required files (4 files in total)

data format for training, valid and test

Training

Notes

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages