This repo contains the data and code for our paper "Is GPT-4 a Good Data Analyst?".
Create an account and get the API key for OpenAI (https://openai.com).
OPENAI_API_KEY=YOUR_KEY
Create an account and get the API key for google retrieval (https://serpapi.com).
SERPAPI_KEY=YOUR_KEY
pip install -r requirements.txt
Download the sql databases from "nvBench dataset" to the current directory and name it as nvBench-main
.
python main.py
We provide a demo and you can try it by running the following command:
cd demo/
python demo_main.py
You will first enter the OpenAI and SERPAPI keys according to the instructions:
* Please enter your OpenAI API key:
* Please enter your Google API key:
Then you will be prompted to choose the sqlite file and sql schema:
* Please choose the database file to read:
* Please choose the schema file to read:
You can select demo/apartment_rentals.sqlite
and demo/schema.sql
. Or you can play with your own data.
Next, you will be prompted to enter the question. Press enter to use the default question or enter your own question:
* Please enter your question (Press Enter to use the default question):
The pipeline starts to run now.
The analysis results generated by GPT-4 and annotation results for each analysis bullet point are shown in data/analysis_results.xlxs
.
Other data and experimental results will be released soon.
If the code is used in your research, please star our repo and cite our paper as follows:
@inproceedings{cheng2023gptda,
title={Is GPT-4 a Good Data Analyst?},
author={Liying Cheng and Xingxuan Li and Lidong Bing},
booktitle={Findings of EMNLP},
url={"https://arxiv.org/abs/2305.15038"},
year={2023}
}