Skip to content

Latest commit

 

History

History
46 lines (37 loc) · 2 KB

README.md

File metadata and controls

46 lines (37 loc) · 2 KB

Search Engine for Movies

Domain specific Information Retrieval System

Problem Statement:

The task is to build a search engine which will cater to the needs of a particular domain. You have to feed your IR model with documents containing information about the chosen domain. It will then process the data and build indexes. Once this is done, the user will give a query as an input. You are supposed to return top 10 relevant documents as the output.

About the project

Dataset used - Kaggle-movie-plots

Have a look at the file Design Architecture. It includes the concepts used along with the modified implementation of the TF-IDF ranking.

Project By:


How to run the code

  1. Clone the repository : git@github.com:JuiP/Information-Retrieval.git

  2. cd Information-Retrieval

  3. Run files in the order:

           python3 preprocess.py
           python3 tfidf.py
           python3 server.py
    
  4. In your browser go to http://0.0.0.0:3000/

  5. Type your query in the search bar and wait till it returns the relevant documents :)


Dependencies/modules used

  • time
  • nltk
  • pandas
  • pickle
  • Numpy
  • heapq
  • flask
  • os