ZFundPredictor (ver. 1.2.0)

This project aims to build a stable price predictor for open-end mutual funds available in China's fund market.

It consists of:

A web-scraping framework that extracts data of funds from http://fund.eastmoney.com/, and stores the data in a local database.
An ETL pipeline that mainly transforms the raw data into datasets for machine learning models.
A machine learning/deep learning prediction tool (currently based on LSTM only - more to come in the future).

Prerequisites

This project is written in Python with data stored into a MySQL database.

Key libraries include Scrapy · PyMySQL · Numpy · Pandas · Matplotlib · Seaborn · SciKit-Learn · TensorFlow (tf.keras)

Getting Started

Scrapy Framework

The following raw data of a fund will be collected using spider em.py:
- Fund Name, Ticker, Type (stock/stock index/bond/hybrid)
- Current Investment Style (large-/mid-/small-cap, value/growth)
- Current Net Assets, Net Assets of the last two reporting periods
- Current Asset Allocation (% of stock/bond/cash)
- Current Industry Allocation (top 5, and their respective weights)
- Manager Performance: Current Ranking Score in similar funds (between 0 and 1, the larger the better)
- Historical Prices from launching
- Performance Metrics: Fund Returns (recent 1d/1m/6m/1y/2y/3y), Standard Deviation (recent 1y/2y/3y), Sharpe Ratio (recent 1y/2y/3y)
Note: data are updated daily by the scraped website, except for asset size, asset allocation and industry allocation, which are updated per reporting period (season).

Complete source code of this framework can be found here.
Data Prepossessing
```
from ZFundETL import FundETL
ETL = FundETL()
funds, categorical = ETL.quick_prepossessing()
```
For more details, please check the demo.

This process mainly generates two datasets.
- funds: contains the historical daily prices of available funds during the selected period, plus the daily returns of the benchmarks (stock or bond index).
- categorical: contains all the short-term invariant features, including fund types, fund styles, asset size, ranking scores (manager performance), asset allocation, and industry allocation.

Predictor

from ZFundPredictor import FundPredictor
predictor = FundPredictor(funds)
single_prediction = predictor.get_prediction(ticker, **params)
ensemble_prediction = predictor.ensemble_prediction(ticker, **tune_params)

For more details, please see the following section and the demo.

Predicting Strategy

The current predicting algorithm is based on LSTM with sliding windows.

Key Terms
- Lookback: the length of a sliding window, indicating how many periods we are looking back to the past.
- Lookahead: the number of periods that we want to predict into the future.
- MA: moving average indicators.
  - dist_EMA: the distances between fund price and x-day exponential moving average (EMA) of the fund on each day.
  - signal_EMA: one-time EMA signal. -1 if the shorter EMA crosses above the longer EMA on that day, 1 vice versa, and 0 if there is no cross.
  - status_BB: short-term status of the Bollinger Bands indicator. -1 if the value of x-day BB indicator is below the lower band, 1 if it is above the upper band, and 0 otherwise.
Currently in use:
- lookback, lookahead: [50, 1], [120, 2] and [120, 5]
- MA: [None], [20, 50] or [5, 50]
  - 'ema' - adopt only the EMA indictors dist_ema (e.g. dist_ema5 refers to 5-day EMA, and dist_ema50 refers to 50-day EMA).
  - 'all' - adopt all available indicators.
Splitting Method

LSTM Time Series

Architecture and training/predicting methods

A model with a 120-day Lookback period and a 5-day Lookahead period would look like this:

Layer (type)	Output Shape	Param #
lstm_0 (LSTM)	(None, 120, 256)	268288
dropout_0 (Dropout)	(None, 120, 256)	0
lstm_1 (LSTM)	(None, 256)	525312
dropout_1 (Dropout)	(None, 256)	0
dense_0 (Dense)	(None, 32)	8224
dense_1 (Dense)	(None, 5)	165
		Total params: 801,989
		Trainable params: 801,989

When get_prediction() is called, the training set will go through the above layers and a single model prediction will be generated with the provided parameters.

ensemble_prediction() allows a basket of MA options and a basket of dropout options. Each combination will form a new model, and based on their R-squared scores, the final prediction will be the weighted average of some of these models.

An illustration of ensemble_prediction():

Hyperparameter Tuning

Epochs: set to a large number
Batch Size: 300 or 400
Learning Rate: 0.0005
Early Stopping: monitor='val_loss', min_delta=1e-5, patience=5, restore_best_weights=True On average, the training stops between epoch 10 and 20.
Dropout: 0.2 or 0.3

Future plan

The future version will introduce categorical features after the LSTM time series forcasting process to account for individual effects and to provide more stable predictions.
So far, the models have only been trained and tested on funds that mainly invest in stocks. In the future, this predictor might be generalized to include bond funds as well.
A fund portfolio builder and monitor...

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
demo		demo
eastm		eastm
LICENSE		LICENSE
LSTM_ensemble.png		LSTM_ensemble.png
README.md		README.md
ZFundETL.py		ZFundETL.py
ZFundPredictor.py		ZFundPredictor.py
splitting_strategy.png		splitting_strategy.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ZFundPredictor (ver. 1.2.0)

Prerequisites

Getting Started

Scrapy Framework

Data Prepossessing

Predictor

Predicting Strategy

Key Terms

Splitting Method

LSTM Time Series

Architecture and training/predicting methods

Hyperparameter Tuning

Future plan

Meta

About

Releases

Packages

Languages

License

ZxZ7/ZFundPredictor

Folders and files

Latest commit

History

Repository files navigation

ZFundPredictor (ver. 1.2.0)

Prerequisites

Getting Started

Scrapy Framework

Data Prepossessing

Predictor

Predicting Strategy

Key Terms

Splitting Method

LSTM Time Series

Architecture and training/predicting methods

Hyperparameter Tuning

Future plan

Meta

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages