Skip to content

Latest commit

 

History

History
139 lines (99 loc) · 4.88 KB

README.md

File metadata and controls

139 lines (99 loc) · 4.88 KB

Adventureworks Database

This repository provides a framework for setting up a relational MySQL database using Python, as well as building a Text-to-SQL Streamlit app that converts plain language questions into SQL queries. The application interacts with a SQLite database populated with AdventureWorks sample data.

Two approaches are used: Prompt Engineering and Retrieval-Augmented Generation (RAG). Both utilize the same LLM, Gemini 1.5 Flash, via Google AI Studio to execute SQL queries on the database and retrieve results. For more details, see the publication.

Installation

  1. Clone this repository:
git clone https://github.com/ahmedsalim3/AdventureWorks-Database.git
  1. Navigate to the repository directory:
cd AdventureWorks-Database
  1. Create a Virtual Environment (Recommended):
python3 -m venv env
source env/bin/activate
  1. Install Requirements:
pip install -r requirements.txt

Create Database

  1. Configure the database at config

  2. run MySQLDatabaseManager class to create the database, follow TODO file for more details

python3 -m src.rdbms.csv2mysql
  1. To explore the database tables run:
python3 -m src.rdbms.table_info

Set Up Your API Key

  1. Get your API key and set it in the config. You can obtain a free one from here, and make sure to uncomment the local run lines

  2. Paste it in the .env_example file and rename that file to .env.

Run the scripts

You can run either the Prompt Engineering or the RAG approach from the main file. Just ask your question and choose the method.

python3 -m src.main

Run the App

To run the Streamlit app, use the following command:

python3 -m streamlit run app.py

Run the app via Dockerfile

To run the app using Docker, follow these steps:

  1. Build the Docker image:
docker build -t image_name .
  1. Run the Docker container:
docker run -p 8501:8501 image_name

Repo's directory structure

.
├── app                          <- Streamlit related fils
│   ├── _pages                  
│   │   ├── prompt_page.py       <- Prompting app page
│   │   ├── rag_page.py          <- RAG app page
│   │   └── utils.py       
│   └── ui                       <- User interface components
│
├── data                         <- Data directory containing databases and raw data
│   ├── database
│   └── raw
│
└── src     
│   ├── rag
│   │   ├── documents.py          <- Document handling for RAG
│   │   ├── example               <- Vector database example
│   │   └── vector_db.py          <- sqlite-vec database
│   │
│   ├── rdbms                     <- Relational Database Management System tools
│   │   ├── TODO.md
│   │   ├── csv2mysql.py          <- Module for converting CSV to MySQL
│   │   ├── mysql2sqlite          <- Submodule for converting MySQL to SQLite
│   │   └── table_info.py
│   │
│   ├── config.py                 <- Configuration settings for the project
│   ├── constants.py              <- Constant values related to AdventureWorks database
│   ├── main.py                   <- Main file to run which approach
│   ├── prompting_text_to_sql.py  <- Prompting implementation for text-to-SQL
│   ├── rag_text_to_sql.py        <- RAG implementation for text-to-SQL
│   └── utils.py
│
├── app.py                        <- Main Streamlit app
├── requirements.txt