Tweet Search Engine

Description

The dataset contained 10 million tweets about the corona subject. The primary goal was, given a query to retrieve the most relevant tweets from the corpus, obviously at the best runtime as possible. Therefore, I used Multithreading programming model throughout the different preprocess parts in order to improve the runtime.

Preprocess:

reader , Read the tweets from the dataset
parser_module , Parse the tweets according to the rules of the class
stemmer , Stemm the tweets using Porter's stemming
indexer , Create and store the posting file om the disk

Model:

ranker , I implemented multiple models to rank tweets supply by GloVe, Word2Vec, WordNet, SpellChecker and Thesaurus.
searcher , return the relevant tweets

💡 Prerequisite

Python 3.7

🛠️ Installation

With Github

git clone https://github.com/samuelbenichou/SearchEngine.git
cd SearchEngine/
python3 setup.py install

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
ExternalMergeSort.py		ExternalMergeSort.py
GUI.py		GUI.py
GloVeMethod.py		GloVeMethod.py
README.md		README.md
Word2Vec.py		Word2Vec.py
configuration.py		configuration.py
document.py		document.py
indexer.py		indexer.py
install_conda.bat		install_conda.bat
install_conda.sh		install_conda.sh
instructions.txt		instructions.txt
main.py		main.py
parser_module.py		parser_module.py
posting.py		posting.py
python_install.sh		python_install.sh
ranker.py		ranker.py
reader.py		reader.py
requirements.txt		requirements.txt
run.bat		run.bat
run.sh		run.sh
sample.parquet		sample.parquet
sample2.parquet		sample2.parquet
search_engine.py		search_engine.py
search_engine_1.py		search_engine_1.py
search_engine_2.py		search_engine_2.py
search_engine_3.py		search_engine_3.py
search_engine_4.py		search_engine_4.py
search_engine_5.py		search_engine_5.py
search_engine_best.py		search_engine_best.py
search_engine_interface.py		search_engine_interface.py
searcher.py		searcher.py
setup.py		setup.py
stemmer.py		stemmer.py
test_part_c_students.py		test_part_c_students.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tweet Search Engine

Preprocess:

Model:

💡 Prerequisite

🛠️ Installation

With Github

About

Releases

Packages

Contributors 2

Languages

samuelbenichou/SearchEngine

Folders and files

Latest commit

History

Repository files navigation

Tweet Search Engine

Preprocess:

Model:

💡 Prerequisite

🛠️ Installation

With Github

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages