Code for our EMNLP 2023 paper Error Detection for Text-to-SQL Semantic Parsing. An updated version is available on arxiv.
- Install pytorch (1.12.1) and torch-geometric (2.1.0.post1) (https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html). The code is tested with Python 3.8.
- Install other required libraries.
pip install -r requirements.txt
- Download preprocessed data and model checkpoints
- Preprocessed data collected from the three base parsers is available at url.
- Unzip the downloaded file and put the
datasets
folder in thepreprocessing
folder.
- Unzip the downloaded file and put the
- Model checkpoints for simulated interactive evaluations for each base parser is available at url (1 checkpoint each).
- Unzip the downloaded file and put the folders in
experiments
folder. Parser_{parser}
folders are for parser-dependent baselines.
- Unzip the downloaded file and put the folders in
-
Prepare training data.
In
preprocessing/dataset_beam.py
, choose indented data filesed_{parser}_beam_train_sim2.json
anded_{parser}_beam_dev_sim2.json
. Then execute 'dataset_beam.py'. This will produce.dat
files for training and dev sets, as well as.pkl
files for indexers of non-terminal nodes.cd preprocessing python3 dataset_beam.py
-
Set the path to training and dev datasets, run
bash train.sh
for CodeBERT+GAT models andtrain_no_graph.sh
forCodeBERT
models.bash train.sh
-
Prepare evaluation data.
First choose the target evaluation dataset and source parser non-terminal node indexer in the
main()
function ofdataset_beam.py
. Then execute to obtain{test_set}_sim2.dat
.cd preprocessing python3 dataset_beam.py
-
Set the path to evaluation dataset and model checkpoint, run
bash test.sh
for CodeBERT+GAT models andtest_no_graph.sh
forCodeBERT
models.bash test.sh
The prediction results
eval_{test_name}.json
can be found in the checkpoint folder.
@inproceedings{chen-etal-2023-error,
title = "Error Detection for Text-to-{SQL} Semantic Parsing",
author = "Shijie Chen and Ziru Chen and Huan Sun and Yu Su",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-emnlp.785",
doi = "10.18653/v1/2023.findings-emnlp.785",
pages = "11730--11743",
}