Skip to content

This is a simple library for extracting keywords from data with/without using a corpus.

License

Notifications You must be signed in to change notification settings

ankushbhatia2/keyword-extract

Repository files navigation

KeyWord Extractor Python

This is a simple library to extract keywords from a document using a corpus. Built in python 3.5.

DEPENDENCIES:

NLTK

INSTALLATION:

Installing dependencies:
    pip install --upgrade nltk
    In python Interpreter, run:
        import nltk
        nltk.download()
    Download the required packages : (pos_tagger, stop_words(optional))

Installing the package:
    Download or clone the package.
    On command line, run : python setup.py install

USAGE :

from keywords import KeyWords
from nltk.corpus import stopwords

with open('script.txt', 'r', encoding="utf8") as f:
    data = f.read()

with open('transcript_1.txt', 'r', encoding="utf8") as f1:
    corpus_1 = f1.read()

stopWords = stopwords.words('english')
keyword = KeyWords(corpus=corpus_1, stop_words=stopWords, alpha=0.8)
d = keyword.get_keywords(data, n=20)
for i in d:
    print("Keyword : %s \n Score : %f" %(i[0], i[1]))

Functionalities:

init :

Arguements:
    corpus (optional) : Add your own corpus for prioritizing keywords based on the corpus.
    stop_words : Your own stop_words list
    alpha (default 0.5) : Alpha is the weighting factor which decides how much weightage you want to give to the corpus.

get_keywords :

Arguements :
    text : Your data
    n(default 20) : Number of keywords you want to extract