Skip to content

samuelbenichou/SearchEngine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tweet Search Engine

Description

The dataset contained 10 million tweets about the corona subject. The primary goal was, given a query to retrieve the most relevant tweets from the corpus, obviously at the best runtime as possible. Therefore, I used Multithreading programming model throughout the different preprocess parts in order to improve the runtime.

Preprocess:

  • reader , Read the tweets from the dataset
  • parser_module , Parse the tweets according to the rules of the class
  • stemmer , Stemm the tweets using Porter's stemming
  • indexer , Create and store the posting file om the disk

Model:

  • ranker , I implemented multiple models to rank tweets supply by GloVe, Word2Vec, WordNet, SpellChecker and Thesaurus.
  • searcher , return the relevant tweets

💡 Prerequisite

Python 3.7

🛠️ Installation

With Github

git clone https://github.com/samuelbenichou/SearchEngine.git
cd SearchEngine/
python3 setup.py install