Skip to content

Commit

Permalink
Merge pull request #183 from boudinfl/v2.0
Browse files Browse the repository at this point in the history
V2.0
  • Loading branch information
boudinfl authored Mar 7, 2022
2 parents f651015 + 67b46dc commit b17d424
Show file tree
Hide file tree
Showing 66 changed files with 642 additions and 137,444 deletions.
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,4 @@ docs/build/html/_sources/*
dev/
.pytest_cache/
/venv/
/venv2/
\.idea/
2 changes: 0 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ python:
# command to install dependencies
install:
- pip install -r requirements.txt
- python -m nltk.downloader stopwords
- python -m nltk.downloader universal_tagset
- python -m spacy download en_core_web_sm
- python -m pip install cython
- python -m pip install git+https://github.com/epfml/sent2vec
Expand Down
46 changes: 20 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,16 +25,6 @@ To pip install `pke` from github:
pip install git+https://github.com/boudinfl/pke.git
```

`pke` also requires external resources that can be obtained using:

```bash
python -m nltk.downloader stopwords
python -m nltk.downloader universal_tagset
python -m spacy download en_core_web_sm # download the english model
```

As of April 2019, `pke` only supports Python 3.6+.

## Minimal example

`pke` provides a standardized API for extracting keyphrases from a document.
Expand All @@ -47,9 +37,9 @@ import pke
# initialize keyphrase extraction model, here TopicRank
extractor = pke.unsupervised.TopicRank()

# load the content of the document, here document is expected to be in raw
# format (i.e. a simple text file) and preprocessing is carried out using spacy
extractor.load_document(input='/path/to/input.txt', language='en')
# load the content of the document, here document is expected to be a simple
# test string and preprocessing is carried out using spacy
extractor.load_document(input='text', language='en')

# keyphrase candidate selection, in the case of TopicRank: sequences of nouns
# and adjectives (i.e. `(Noun|Adj)*`)
Expand All @@ -67,29 +57,33 @@ A detailed example is provided in the [`examples/`](examples/) directory.

## Getting started

Tutorials and code documentation are available at
[https://boudinfl.github.io/pke/](https://boudinfl.github.io/pke/).
To get your hands dirty with `pke`, we invite you to try our tutorials out.

| Name | Link |
| ---------------------------------------------- | ---------- |
| Getting started with `pke` and keyphrase extraction | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/keyphrasification/hands-on-with-pke/blob/main/part-1-graph-based-keyphrase-extraction.ipynb) |
| Model parameterization | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/keyphrasification/hands-on-with-pke/blob/main/part-2-parameterization.ipynb) |
| Benchmarking models | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/keyphrasification/hands-on-with-pke/blob/main/part-3-benchmarking-models.ipynb) |

## Implemented models

`pke` currently implements the following keyphrase extraction models:

* Unsupervised models
* Statistical models
* TfIdf [[documentation](https://boudinfl.github.io/pke/build/html/unsupervised.html#tfidf)]
* KPMiner [[documentation](https://boudinfl.github.io/pke/build/html/unsupervised.html#kpminer), [article by (El-Beltagy and Rafea, 2010)](http://www.aclweb.org/anthology/S10-1041.pdf)]
* YAKE [[documentation](https://boudinfl.github.io/pke/build/html/unsupervised.html#yake), [article by (Campos et al., 2020)](https://doi.org/10.1016/j.ins.2019.09.013)]
* FirstPhrases
* TfIdf
* YAKE [(Campos et al., 2020)](https://doi.org/10.1016/j.ins.2019.09.013)
* Graph-based models
* TextRank [[documentation](https://boudinfl.github.io/pke/build/html/unsupervised.html#textrank), [article by (Mihalcea and Tarau, 2004)](http://www.aclweb.org/anthology/W04-3252.pdf)]
* SingleRank [[documentation](https://boudinfl.github.io/pke/build/html/unsupervised.html#singlerank), [article by (Wan and Xiao, 2008)](http://www.aclweb.org/anthology/C08-1122.pdf)]
* TopicRank [[documentation](https://boudinfl.github.io/pke/build/html/unsupervised.html#topicrank), [article by (Bougouin et al., 2013)](http://aclweb.org/anthology/I13-1062.pdf)]
* TopicalPageRank [[documentation](https://boudinfl.github.io/pke/build/html/unsupervised.html#topicalpagerank), [article by (Sterckx et al., 2015)](http://users.intec.ugent.be/cdvelder/papers/2015/sterckx2015wwwb.pdf)]
* PositionRank [[documentation](https://boudinfl.github.io/pke/build/html/unsupervised.html#positionrank), [article by (Florescu and Caragea, 2017)](http://www.aclweb.org/anthology/P17-1102.pdf)]
* MultipartiteRank [[documentation](https://boudinfl.github.io/pke/build/html/unsupervised.html#multipartiterank), [article by (Boudin, 2018)](https://arxiv.org/abs/1803.08721)]
* TextRank [(Mihalcea and Tarau, 2004)](http://www.aclweb.org/anthology/W04-3252.pdf)
* SingleRank [(Wan and Xiao, 2008)](http://www.aclweb.org/anthology/C08-1122.pdf)
* TopicRank [(Bougouin et al., 2013)](http://aclweb.org/anthology/I13-1062.pdf)
* TopicalPageRank [(Sterckx et al., 2015)](http://users.intec.ugent.be/cdvelder/papers/2015/sterckx2015wwwb.pdf)
* PositionRank [(Florescu and Caragea, 2017)](http://www.aclweb.org/anthology/P17-1102.pdf)
* MultipartiteRank [(Boudin, 2018)](https://arxiv.org/abs/1803.08721)
* Supervised models
* Feature-based models
* Kea [[documentation](https://boudinfl.github.io/pke/build/html/supervised.html#kea), [article by (Witten et al., 2005)](https://www.cs.waikato.ac.nz/ml/publications/2005/chap_Witten-et-al_Windows.pdf)]
* WINGNUS [[documentation](https://boudinfl.github.io/pke/build/html/supervised.html#wingnus), [article by (Nguyen and Luong, 2010)](http://www.aclweb.org/anthology/S10-1035.pdf)]
* Kea [(Witten et al., 2005)](https://www.cs.waikato.ac.nz/ml/publications/2005/chap_Witten-et-al_Windows.pdf)

## Citing pke

Expand Down
7 changes: 7 additions & 0 deletions examples/2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Waiting for the wave to crest [wavelength services]
Wavelength services have been hyped ad nauseam for years. But despite their
quick turn-up time and impressive margins, such services have yet to
live up to the industry's expectations. The reasons for this lukewarm
reception are many, not the least of which is the confusion that still
surrounds the technology, but most industry observers are still
convinced that wavelength services with ultimately flourish
Loading

0 comments on commit b17d424

Please sign in to comment.