Skip to content

rajatb115/Vector-Space-Retrieval

Repository files navigation

Vector space retrieval model with Python3

Get started

$ pip3 install nltk
$ pip3 install pickle
$ pip3 install pytrie

Usage

Run invidx_cons.py script to index documents.

$ python3 invidx_cons.py <Location of documents> <indexfile name>

The indexed data will be generated in the directory where python code is saved. The list of all documents and inverted index is serialized into <indexfile name>.dict and <indexfile name>.idx files (using pickle serializer module). These files will be generated by using invidx_cons.pyscript. These files will be read by vecsearch.py script to perform query.

Run printdict.py script to list the dictionary in format ( <indexterm> : <df> : <offset-to-its-postingslist-in-idx-file> )

$ python3 printdict.py <dict file name>

To perform query search, run vecsearch.py script and pass the address of query file, cutoff value, output file name, index file name and dictionary file name as arguments.

$ python3 vecsearch.py --query <query file address> --cutoff <cutoff value> --output <output file name> --index <index file name> --dict <dictionary file name>