- Install Python 3.5+
- Install NLTK 3
- Install [Pickle]
- Install [Pytrie]
$ pip3 install nltk
$ pip3 install pickle
$ pip3 install pytrie
Run invidx_cons.py
script to index documents.
$ python3 invidx_cons.py <Location of documents> <indexfile name>
The indexed data will be generated in the directory where python code is saved. The list of all documents and inverted index is serialized into <indexfile name>.dict
and <indexfile name>.idx
files (using pickle
serializer module). These files will be generated by using invidx_cons.py
script. These files will be read by vecsearch.py
script to perform query.
Run printdict.py
script to list the dictionary in format ( <indexterm> : <df> : <offset-to-its-postingslist-in-idx-file>
)
$ python3 printdict.py <dict file name>
To perform query search, run vecsearch.py
script and pass the address of query file, cutoff value, output file name, index file name and dictionary file name as arguments.
$ python3 vecsearch.py --query <query file address> --cutoff <cutoff value> --output <output file name> --index <index file name> --dict <dictionary file name>