QPPNet in PyTorch

The README below was written for the release tagged rabbit721.

This code contains a sample implementation for Plan-Structured Deep Neural Network Models for Query Performance Prediction presented at VLDB 2019, and the code for training/testing on

TPC-H queries generated using https://github.com/gregrahn/tpch-kit.git and benchmarked with Postgres

The TPC-H data are generated with ./dbgen -s <scale factor>.

The TPC-H queries are generated using ./qgen <query_id> -r <seed number> where varying seed random number is used to generate different queries from a template.

Tables in PostgresSQL are created using dss.ddl and then have indexes created with tpch-postgres-index-ddl.sql.

The query plan structure and query performance metrics are retrieved as json objects by running the generated queries above with the explain (analyze, format JSON, verbose) statement prepended.
TPC-H queries generated using https://github.com/gregrahn/tpch-kit.git and benchmarked with NoisePage
TPC-C queries and smallbank dataset generated using OLTP and benchmarked with NoisePage

Prerequisites

Linux or macOS
Python 3

Getting Started

Installation

Cloning the repo:

git clone https://github.com/rabbit721/QPPNet.git
cd QPPNet

Install the required python packages:
- For pip: pip install -r requirements.txt
- For conda: conda env create -f environment.yml

Getting the Datasets

TPC-H benchmarked with PostgresSQL

# Under directory dataset/postgres_tpch_dataset
wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/postgres/tpch/psqltpch0p1g.zip && unzip psqltpch0p1g.zip
wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/postgres/tpch/psqltpch1g.zip && unzip psqltpch1g.zip
wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/postgres/tpch/psqltpch10g.zip && unzip psqltpch10g.zip

TPC-H benchmarked with NoisePage:

Data files already located under directory dataset/terrier_tpch_dataset as execution_0p1G.csv, execution_1G.csv, and execution_10G.csv
TPC-C and smallbank benchmarked with NoisePage:

Data files already located under directory dataset/oltp_dataset as tpcc_pipeline.csv and sb_pipeline.csv

Examples for Training a model

On TPC-H dataset generated using https://github.com/gregrahn/tpch-kit.git with SF=1 and benchmarked with PostgresSQL

python3 main.py --dataset PSQLTPCH -s 0 -t 250000 --batch_size 128 -epoch_freq 1000 --lr 2e-3 --step_size 1000 --SGD --scheduler --data_dir ./dataset/postgres_tpch_dataset/tpch1g/900-exp_res_by_temp/

On TPC-H dataset generated using https://github.com/gregrahn/tpch-kit.git with SF=1 and benchmarked with NoisePage

python3 main.py --dataset TerrierTPCH -s 0 -t 250000 --batch_size 512 -epoch_freq 1000 --lr 1e-3 --step_size 1000 --SGD --scheduler --data_dir ./dataset/terrier_dataset/execution_1G.csv

On TPC-C dataset generated using OLTP with SF=1 and benchmarked with NoisePage

python3 main.py --dataset OLTP -s 0 -t 250000 --batch_size 512 -epoch_freq 1000 --lr 5e-3 --step_size 1000 --SGD --scheduler --data_dir ./dataset/oltp_dataset/tpcc_pipeline.csv

Using a pre-trained model

Getting a model trained for 4000 epochs on TPC-H SF=1 dataset benchmarked with PostgresSQL:

wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/trained_models/psqltpch_epoch4000.zip

Getting a model trained for 20000 epochs on TPC-H SF=1 dataset benchmarked with NoisePage:

wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/trained_models/terriertpch_epoch20000.zip

Getting a model trained for 10000 epochs on TPC-C dataset generated with OLTP and benchmarked with NoisePage:
```
wget http://www.andrew.cmu.edu/user/jiejiao/data/qpp/trained_models/tpcc_epoch10000.zip
```

Examples for Testing a trained model

When testing, please make sure that trained models are saved in ./saved_model and the mean and range values of the train data are provided with the flag '--mean_range_dict' in order to normalize the input.

The '-s' flag is interpreted as an integer and is used to specify the epoch number of the saved model to be tested.

Testing a model trained for 4000 epochs on TPC-H SF=1 dataset benchmarked with PostgresSQL on TPC-H SF=10 dataset benchmarked with PostgresSQL.

python3 main.py --test_time --dataset PSQLTPCH -s 4000 --mean_range_dict mean_range_dict.pickle --data_dir ./dataset/postgres_tpch_dataset/tpch10G/900-exp_res_by_temp/

Testing a model trained for 10000 epochs on the TPC-C benchmark on the smallbank dataset. Please make sure that models are saved in ./saved_model
```
python3 main.py --test_time --dataset OLTP -s 10000 --data_dir ./dataset/oltp_dataset/sb_pipeline.csv
```

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
dataset		dataset
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
main.py		main.py
metric.py		metric.py
model_arch.py		model_arch.py
pg_snapshot.py		pg_snapshot.py
pg_utils.py		pg_utils.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QPPNet in PyTorch

Prerequisites

Getting Started

Installation

Getting the Datasets

Examples for Training a model

Using a pre-trained model

Examples for Testing a trained model

About

Releases 1

Packages

Languages

cmu-db/QPPNet

Folders and files

Latest commit

History

Repository files navigation

QPPNet in PyTorch

Prerequisites

Getting Started

Installation

Getting the Datasets

Examples for Training a model

Using a pre-trained model

Examples for Testing a trained model

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages