Code, data, and model for our ACL 2023 paper Text-to-SQL Error Correction with Language Models of Code.
- Installation
- Data
- Preprocessing
- Training
- Evaluation
- Citation
Please run the following commands to create a conda environment in Python 3.9 with the required packages.
conda create -n sqledit python=3.9 pip
conda activate sqledit
pip install -r requirements.txt
Please first download the original Spider dataset from this link and unzip it in the data/
folder.
unzip spider.zip -d data/
Then, please download our synthesized SQL error correction data from this link and also put them in the data/
folder.
The data/
folder should be organized as follows:
.
├─── data
│ ├─── spider
│ ├─── ...
│ ├─── spider-dev-bridge.json
│ ├─── spider-dev-codet5.json
│ ├─── spider-dev-smbop.json
│ ├─── spider-train-bridge.json
│ ├─── spider-train-codet5.json
│ ├─── spider-train-smbop.json
│ ├─── sqledit_dev_gold.sql
│ ...
python run.py --preproc --use_content --query_type pydict --edit_type program --base_parser smbop
mkdir model
python run.py --train --load_checkpoint Salesforce/codet5-base --save_checkpoint model/codet5-sqledit --seed 42 --gpu 0
python run.py --eval --load_checkpoint model/codet5-sqledit --gpu 0
You may download our pre-trained model checkpoints from this link. It includes our CodeT5-PyDict+Program
model trained for the three text-to-SQL base parser in our paper.
@inproceedings{chen-etal-2023-sqledit,
title = "Text-to-SQL Error Correction with Language Models of Code",
author = "Chen, Ziru and
Chen, Shijie and
White, Michael and
Mooney, Raymond and
Payani, Ali and
Srinivasa, Jayanth and
Su, Yu and
Sun, Huan",
booktitle = "Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/2305.13073"
}