🔍 Simple Search Engine

An intelligent document search engine that leverages natural language processing techniques to provide relevant and personalized search results. Powered by Flask, TF-IDF, and cosine similarity.

📋 Features

Text Preprocessing: Tokenization, stop word removal, and lemmatization
Inverted Index Construction: Allows efficient term-based lookups
TF-IDF Calculation: Measures the importance of terms in each document
Cosine Similarity: Computes similarity between the query and documents for ranking
Spell Checking: Automatically corrects misspelled terms in user queries
Web Interface: Search through documents using a simple HTML form

🛠️ Prerequisites

Python 3.10+
Internet connection (for downloading NLTK stopwords and spaCy model)

🚀 Installation

Clone the Repository

git clone https://github.com/Zilean12/Search-Engine.git

cd Search-Engine

Install Required Packages Install the necessary Python packages listed in requirements.txt:
```
pip install -r requirements.txt
```
Download spaCy Model
```
python -m spacy download en_core_web_sm
```
Download NLTK Data Download the stopwords dataset from NLTK
Run the Application Start the Flask app by running:
```
python app.py
```

The app will be available at http://127.0.0.1:5000.

🗂️ Project Structure

1. app.py: Main application file with text processing, TF-IDF calculation, and Flask routes.

2. templates/index.html: HTML template for the search interface.

3. static/style.css: CSS file for styling the web interface.

4. requirements.txt: List of required Python packages.

🔍 Usage

Open the app in your browser (http://127.0.0.1:5000).
Enter a search query in the input box and click "Search."
The application will display documents ranked by relevance to the query, showing their cosine similarity scores. Misspelled terms in the query will be automatically corrected.

🔑 Key Components

Text Preprocessing

The text is converted to lowercase, punctuation is removed, stop words are removed, and remaining words are stemmed.

Inverted Index

An inverted index is created to store document IDs for each unique term, facilitating fast lookup of terms in documents.

TF-IDF Calculation

The TF-IDF score is calculated for each term in each document. TF (Term Frequency) and IDF (Inverse Document Frequency) scores are used to measure term importance.

Cosine Similarity

The similarity between the query and each document is calculated using cosine similarity, which helps rank documents based on relevance.

Spell Checking

The application uses a custom spell checker to automatically correct misspelled terms in user queries, improving the search experience.

🧰 Dependencies

Flask: Web framework for Python, used for handling HTTP requests and serving the web application.
NLTK (Natural Language Toolkit): Used for text preprocessing tasks, such as removing stopwords.
NumPy: Provides support for numerical operations and vector calculations, essential for data processing.
Tabulate: Formats data in tables for improved readability in the console.
Colorama: Cross-platform library for adding color formatting to terminal output, making console messages more intuitive.
spaCy: Advanced NLP library, used with the en_core_web_sm model to support text processing and tokenization.
rapidfuzz: Library for fuzzy string matching, enhancing search capabilities by identifying approximate matches.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 Simple Search Engine

Table of Contents

📋 Features

🛠️ Prerequisites

🚀 Installation

🗂️ Project Structure

🔍 Usage

🔑 Key Components

Text Preprocessing

Inverted Index

TF-IDF Calculation

Cosine Similarity

Spell Checking

🧰 Dependencies

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
static		static
templates		templates
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Zilean12/Search-Engine

Folders and files

Latest commit

History

Repository files navigation

🔍 Simple Search Engine

Table of Contents

📋 Features

🛠️ Prerequisites

🚀 Installation

🗂️ Project Structure

🔍 Usage

🔑 Key Components

Text Preprocessing

Inverted Index

TF-IDF Calculation

Cosine Similarity

Spell Checking

🧰 Dependencies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages