simpleRAGBot: An Advanced NLP Chatbot

Overview

simpleRAGBot is a state-of-the-art Natural Language Processing (NLP) chatbot that leverages the Retrieval-Augmented Generation (RAG) methodology to provide context-aware, accurate, and relevant responses. Utilizing advanced models and a comprehensive document base, the chatbot processes various document types, including PDFs, Markdown files, and web URLs.

What is RAG?

Retrieval-Augmented Generation (RAG) combines the benefits of both retrieval-based and generative NLP models. It retrieves relevant information from a document collection and uses it to generate informed and context-aware responses. This approach allows for enhanced accuracy and relevance in natural language understanding and generation.

Features

Asynchronous Web Scraping: For dynamic and varied content acquisition.
Document Summarization: Utilizes the facebook/bart-large-cnn model to condense large documents into pertinent summaries.
Document Processing and Indexing: Using FAISS (Facebook AI Similarity Search) for rapid retrieval and effective segmentation.
RAG Chain with Mistral-7B-Instruct-v0.2 Model: Enriched response generation leveraging a powerful generative model.
Web Server and CLI Support: Offers both a Flask-based API for web interaction and a command-line interface for direct usage.
Configurable Multi-threaded System: Ensures efficient prompt processing and response generation.

Example:

Installation

Prerequisites

Ensure you have the following installed:

Python 3.8+
Pip (Python package installer)
LLM model (by default https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)

Setup

Clone the repository:

git clone https://github.com/your-username/simpleRAGBot.git

Navigate to the cloned directory:
```
cd simpleRAGBot
```
Install required packages:
```
pip install -r requirements.txt
```

Usage

Command Line Interface (CLI)

To use simpleRAGBot via the command line:

Run the main script:
```
python main.py
```
Follow the on-screen prompts to input your queries.

Web Interface

simpleRAGBot also offers a web interface powered by Flask:

Start the Flask server:
```
python main.py
```
Open your browser and navigate to http://localhost:5000 (or the configured port).

Endpoints

/prompt (POST): Submit a prompt/query.
/result (GET): Fetch the result of a submitted prompt. Requires prompt_id as a parameter.
/system (GET): Get system information, including the model used, app version, and available product names.

Default Ports

Flask server: Port 5000 (configurable in config.py)

Configurations

Modify config.py to adjust settings like ports, model parameters, document paths, etc.

Packages and Libraries

This project relies on several key Python libraries:

Flask: For the web server and API endpoints.
PyTorch: For handling machine learning models.
Transformers: From Hugging Face, for pretrained NLP models.
FAISS: For efficient similarity search and clustering of dense vectors.
TQDM: For progress bars in loops and console output.
Playwright: For asynchronous web scraping.

Contributing

Contributions to simpleRAGBot are welcome! Please read CONTRIBUTING.md for guidelines on how to contribute.

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Jan Miller - @miller_itsec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

simpleRAGBot: An Advanced NLP Chatbot

Overview

What is RAG?

Features

Example:

Installation

Prerequisites

Setup

Usage

Command Line Interface (CLI)

Web Interface

Endpoints

Default Ports

Configurations

Packages and Libraries

Contributing

License

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

simpleRAGBot: An Advanced NLP Chatbot

Overview

What is RAG?

Features

Example:

Installation

Prerequisites

Setup

Usage

Command Line Interface (CLI)

Web Interface

Endpoints

Default Ports

Configurations

Packages and Libraries

Contributing

License

Contact