A Flask-based web application that allows users to query data using natural language processing. The application uses OpenAI's GPT model to interpret questions and generate appropriate pandas queries to filter and analyze data.
- Natural language query processing
- Interactive chat interface
- Dynamic data table display
- Automated pandas query generation
- Comprehensive logging system
- Question type classification (filtering vs explanation)
- Secure query execution system
- Python 3.8+
- OpenAI API key
- Flask
- Pandas
- Clone the repository:
git clone https://github.com/sybil443/chat_w_my_data.git
cd chat_w_my_data
- Create and activate virtual environment:
# Windows
python -m venv venv
venv\Scripts\activate
# macOS/Linux
python -m venv venv
source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Create a
.env
file in the root directory with the following:
OPENAI_API_KEY=your_api_key_here
MODEL=gpt-4
CSV_FILE_PATH=path/to/your/data.csv
FLASK_DEBUG=True
project_folder/
├── src/
│ ├── __init__.py
│ ├── query_system.py
│ └── logger_config.py
├── static/
│ ├── user-avatar.png
│ └── assistant-avatar.png
├── templates/
│ └── index.html
├── logs/
│ └── application_[timestamp].log
├── app.py
├── config.py
├── requirements.txt
└── README.md
- Start the application:
python app.py
- Open your browser and navigate to:
http://localhost:5000
- Enter questions in the chat interface. Example questions:
- "What is this dataset about?"
- "Show me everything about Google"
- "What's the revenue of Tesla?"
- "Compare the market capitalization of different companies"
- Uses OpenAI's GPT model to understand user queries
- Classifies questions as either requiring data filtering or general explanation
- Generates appropriate pandas queries based on user questions
- Dynamic table generation
- Consistent column ordering (Question first)
- Clean and responsive interface
- Error handling for data display
- Comprehensive logging of all operations
- Separate logging for application and query system
- Log rotation to manage file sizes
- Debug and info level logging
- Safe query execution system
- Protection against harmful operations
- Input validation
- Restricted pandas operations
The application can be configured through config.py
:
class Config:
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
MODEL = os.getenv("MODEL", "gpt-4")
CSV_FILE_PATH = os.getenv("CSV_FILE_PATH")
DEBUG = os.getenv("FLASK_DEBUG", "True").lower() == "true"
Logs are stored in the logs
directory with the following format:
application_YYYYMMDD_HHMMSS.log
Log levels:
- DEBUG: Detailed information (file only)
- INFO: General flow (console and file)
- ERROR: Error messages with stack traces
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
[Your chosen license]
Your Name - sybilshi@gmail.com Project Link: https://github.com/sybil443/chat_w_my_data.git