Use graph-based analysis to re-classify stocks and experiment different re-classification methodologies to improve Markowitz portfolio optimization performance in the low-frequency quantitative trading context.
Note that for strategy confidentiality, many files are hidden.
To accommodate speedy development, the current code structure simplicity is sacrificed. This will be addressed in later versions.
Project Website: Dynamic Stock Industrial Classification
This project contains the following six modules:
- data ingestion: address finance data I/O and handle storage of intermediate results;
- factor generation: compute and store factors alpha factors and risk factors for low-frequency trading;
- backtest: low-frequency backtest framework (both factors and signals). Factors have continuous values on each cross section whereas signals have only -1, 0, and 1 overall;
- factor combination: combine factors using ML models;
- portfolio optimization: Markowitz portfolio optimization, with turnover, industrial exposure, style exposure, and various other constraints.
- graph cluster: experiment different graph-based clustering on stocks.
China A-Share stocks, the corresponding major index data (sz50, hs300, zz500, zz1000), and the member stock weights from 20150101 to 20211231, provided by Shanghai Probability Quantitative Investment.
With a fixed predicted ML results, we go through the optimization pipeline to optimize each trained classification.
Stock Pool: zz1000 member stocks
Benchmark: zz1000 index
Time Period: 20170701 - 20211231
Model | AlphaReturn (cumsum) | AlphaSharpe | AlphaDrawdown | Turnover |
---|---|---|---|---|
LinearRegressor | 71.58 | 1.92 | -19.84 | 1.01 |
LgbmRegressor | 145.64 | 3.65 | -11.58 | 1.21 |
LgbmRegressor-opt | 146.73 | 2.96 | -29.79 | 1.11 |
.. | .. | .. | .. | .. |
40-cluster PMFG Unfiltered Spectral | 154.45 | 3.15 | -22.69 | 1.11 |
10-cluster PMFG Filtered Average Linkage | 160.95 | 3.32 | -26.77 | 1.11 |
30-cluster AG Unfiltered Sub2Vec | 160.96 | 3.24 | -23.05 | 1.10 |
5-cluster MST Unfiltered Sub2Vec | 163.26 | 3.27 | -27.39 | 1.11 |
20-cluster PMFG Filtered Node2Vec | 164.68 | 3.30 | -27.06 | 1.11 |
Compared to the original optimization result, we observe a 12.23% improvement in excess return and 12.16% improvement in excess Sharpe ratio.
Since factors based on price and volume lost their predictive power staring from 20200701, we also look at the performances before that time.
Time Period: 20170701 - 20200701
Model | AlphaReturn (cumsum) | AlphaSharpe | AlphaDrawdown | Turnover |
---|---|---|---|---|
LgbmRegressor | 150.64 | 6.06 | -4.59 | 1.23 |
LgbmRegressor-opt | 170.31 | 5.43 | -6.76 | 1.12 |
.. | .. | .. | .. | .. |
10-cluster PMFG Filtered Sub2Vec | 173.10 | 5.49 | -5.51 | 1.12 |
5-cluster MST Filtered Sub2Vec | 182.89 | 5.78 | -7.14 | 1.12 |
10-cluster AG Filtered Sub2Vec | 181.50 | 5.64 | -7.40 | 1.12 |
20-cluster PMFG Filtered Node2Vec | 184.21 | 5.85 | -6.42 | 1.12 |
In this period, we observe a 8.16% improvement in excess return and a 7.73 improvement in excess Sharpe ratio, compared to the original optimization result.
For a complete list of results, check out summary_20170701_20211231.csv and summary_20170701_20200701.csv. And more details are discussed on the project website listed above.
To run codes in this project, it is recommended to create an environment listed in the environment.yml. If conda is installed, run:
conda env create -f environment.yml
conda activate finance-base
Alternatively, one could also pull the corresponding docker image from yangshengaa/finance-base and then activate the finance-base environment using the latter conda command.
It's very easy to use this platform!
Tips:
- run each module at a time, and run the following command sequentially;
- change config for corresponding module in respective files (file location indicated inside run.py);
- detailed running instructions, including a walkthrough of parameters in each modules, are in README of each module.
To run each module, in current directory:
Factor Generation:
- factor generation:
python run.py gen
Backtest:
- backtest factor:
python run.py backtest_factor
- backtest signal:
python run.py backtest_signal
Factor Combination:
- factor combination:
python run.py comb
Portfolio Optimization:
- generate factor returns:
python run.py opt_fac_ret
- estimate covariance matrices:
python run.py opt_cov_est
- adjust weight:
python run.py opt_weight
Graph Clustering:
- train graph clustering:
python run.py cluster_train
To run each submodules, in current directory:
- generate pairs factors:
python run.py pairs
- generate risk factors:
python run.py gen_risk
Currently risk attribution module is very slow and suboptimal. To be addressed later.
Special thanks to coworkers and my best friends at Shanghai Probability Quantitative Investment: Beilei Xu, Zhongyuan Wang, Zhenghang Xie, Cong Chen, Yihao Zhou, Weilin Chen, Yuhan Tao, Wan Zheng, and many others. This project would be impossible without their data, insights, and experiences.
Log known issues here:
- signals given by factor test could not give the same alpha returns (slightly less) as in signal test
- examine output holding stats
- plain risk attribution