Releases: ScottfreeLLC/AlphaPy
LightGBM and CatBoost
Python Ecosystem Upgrade
This release contains bug fixes and refactoring based on major releases in the Python ecosystem.
Feature Names and Encoders
This release contains the following features:
-
Feature Names are now shown in the Feature Importance Plots.
-
Categorical Encoders have been updated: https://contrib.scikit-learn.org/categorical-encoding/
-
Confusion Matrices now have counts along with percentages.
-
This release has been updated for pandas 1.0 and scikit-learn 0.22.
-
Bug Fixes and Refactoring. Please report any problems to: https://github.com/ScottfreeLLC/AlphaPy/issues
Data Feed Update
This release contains the following features:
- The market section of the market.yml file has been augmented with the fields subschema, api_key_name, and api_key. Subschema refer to specific feeds within a service. For example, WIKI is a stock feed within Quandl, so schema is set to the value quandl and subschema is set to wiki. We have also added API keys, which are required to access IEX and Quandl. The api_key_name is an environment variable defined by the platform, and you must register for an API key, inserting the api_key value as shown in the example below.
market:
create_model : True
data_fractal : 1w
data_history : 2000
forecast_period : 1
fractal : 1d
lag_period : 1
leaders : []
predict_history : 50
schema : iex
subschema :
api_key_name : IEX_TOKEN
api_key : xx_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
subject : stock
target_group : tech
-
IEX data has been migrated to the cloud and requires an API key. We use the Python package iexfinance to get historical daily and intraday data. Because of the different pricing tiers, you may run out of quota if you are still on the free tier.
-
Yahoo daily data is still available through pandas-datareader (no API key required), and intraday data is rate-limited via a custom Yahoo URL.
-
There are other sources of stock prices via pandas-datareader. Depending upon the API, we may or may not support these specific feeds.
-
The Quandl WIKI feed discontinued its daily prices, but you can still get historical data before March 27th, 2018.
-
The Google Finance API is no longer available.
-
Note that AlphaPy fetches all of its data with the aforementioned feeds on the fly, but if you get historical prices elsewhere, then you can drop the files in the data directory (use schema data), and AlphaPy will automatically load these data for you.
-
We recommend using the IEX for your primary data feed, but you can use Yahoo if you just need daily data.
-
AlphaPy used to expect separate files for both train and test data. Now, you can provide a single train file, and this file will be split into train and test. You can adjust the split ratio in the model.yml file.
-
When AlphaPy runs in prediction mode, it uses predict_history from the market.yml file to build those features that require data from previous time periods, e.g., a 50-day moving average. Thus, the predict_history field should be set to the number of time periods for the feature that goes furthest back in time. (N.B. This does not apply to model training, which was a bug fixed in this release.)
-
The package joblib is now imported directly, and many of the multiprocessing bugs seem to have been fixed.
-
A bug for dropping columns has been fixed.
IEX
This release contains the following features:
-
The IEX exchange is now the default end-of-day stock feed. You can download up to 5 years of historical daily data.
-
Several market data bugs have been fixed.
-
The multiprocessing fork code has been removed. If you're still having problems with Keras and multiprocessing, then set number_jobs to 1 in the model.yml file.
Keras
This release contains the following features:
- You can now use the Keras Sequential model for classification and regression. Please refer to the KERASC and KERASR entries in the algos.yml configuration file. Note that the input_shape argument is automatically added by AlphaPy based on the shape of the training set, with a limit of 10 layers per model.
KERASC:
# Keras Classification
model_type : classification
layers : ["Dense(12, activation='relu')",
"Dense(1, activation='sigmoid')"]
compiler : {"optimizer" : 'rmsprop',
"loss" : 'binary_crossentropy',
"metrics" : 'accuracy'}
params : {"epochs" : 50,
"batch_size" : 10,
"verbose" : 1}
grid : {}
KERASR:
# Keras Regression
model_type : regression
layers : ["Dense(10, activation='relu')",
"Dense(1)"]
compiler : {"optimizer" : 'rmsprop',
"loss" : 'mse'}
params : {"epochs" : 50,
"batch_size" : 10,
"verbose" : 1}
grid : {}
-
Remove redundant RFE code and estimator classes. RFE is performed only when the coef_ or feature_importances_ attribute is present.
-
Report (log) only those metrics that are relevant to either classification or regression.
-
Added Brier Score and Cohen's Kappa for classification metrics.
-
Remove the scoring field from the algos.yml configuration file. Just use scoring_function in the model.yml file.
Unify
This release connects models with systems for MarketFlow, i.e., you can now use the probabilities generated by a classifier as trading signals. For example:
system:
name : alpha
holdperiod : 0
longentry : phigh_0.6
longexit :
shortentry :
shortexit :
scale : False
The variables phigh and plow are variables defined by AlphaPy. For example, if the probability associated with a time series prediction is greater than or equal to 0.6, then the system would go long. Likewise, a short entry might have a value of plow_0.4, meaning the probability is less than or equal to 0.4.
This release also fixes a bug with '.' in variable names, such as the examples shown above.
The market section of the market.yml has been changed to add the fields create_model and data_fractal (resample_data has been removed). If fractal is different than the data_fractal, then the data are resampled to the fractal value. Set create_model to False if you wish to test different systems after creating your initial model, or if your systems are free-standing and don't use the output of a model.
market:
create_model : True
data_fractal : 1min
data_history : 100
forecast_period : 1
fractal : 20min
lag_period : 1
leaders : []
predict_history : 50
schema : data
subject : crypto
target_group : btc
Finally, there is now one general system for both daily and intraday systems. Intraday signals are automatically closed at the end of the day. All systems follow the format shown above, and you can mix model-based signals with technical signals.
Quandl
Support has been added for Quandl data (#15) using the pandas Web data reader, e.g., you can now specify quandl_wiki in the schema field. Note that WIKI is a source of free end-of-day stock data from Quandl, but its data history is limited. Regarding the state of Google and Yahoo stock data, AlphaPy can still get intraday data from Google, but end-of-day data is no longer available from Google. Getting end-of-day stock data from Yahoo is sporadic, so we recommend a paid provider for consistency.
You can now drop files into the data directory: intraday data, daily data, or any other regular time series. Intraday data must have a separate date and time column, along with the OHLCV columns. Daily data and higher requires only the date column. To use local data, specify schema: data in the market.yml file. The file names must conform to the convention: symbol_subject_schema_fractal.csv, e.g., aapl_stock_data_1d.csv.
We added a new crypto subject to test out AlphaPy on cryptocurrency data. We will contribute another tutorial shortly for testing an open range breakout system on btc_crypto_data_1min.csv. Note that fractals now conform to Pandas series offsets if the user chooses to resample from the original data (#16).