This solution uses a Bayesian hierarchical model largely based on ceshine/kaggle-winton-2016 repo (which is based on Tsakalis Kostas's model).
WARNING: This is a very simple baseline model. It is not ready for real trading.
Unzip TBrain_Round2_DataSet_20180615.zip
to get the sample data. Everything is actually public information. The zip file is provided only to make reproducing results easier.
Please check the Stan model.
It uses the last five trading day to predict the price at the end of the target day. We train one independent model for each weekday (so exactly 5 models are trained).
Public holidays are ignored (which is not ideal and definitely can be improved).
A Dockerfile is included.
- Firstly, build the image (example:
docker build -t pystan .
) - Start a container and mount the project dir. (example:
docker run -ti -v $(pwd):/lab pystan bash
). - Run
run.sh
script. (example:cd /lab && ./run.sh
)
The predictions will be saved in cache/baseline.csv.
Change the target_date
variable in make_submission
function from scripts/bayesian.py to the first day (t) you want to predict. The script will output prediction from t to t+4.
- scripts/fix_csv_files.py: convert Big-5 to UTF-8 and fix some trailing commas.
- scripts/preprocess.py: conver the csv file into a more computer-friendly data frame and store it as a feather file.
- scripts/bayesian.py: model training, evaluation, and prediction.
- scripts/prepare_features.py: some very basic feature engineering utility functions.