diff --git a/18_convolutional_neural_nets/06_cnn_for_trading_features_to_clustered_image_format.ipynb b/18_convolutional_neural_nets/06_cnn_for_trading_features_to_clustered_image_format.ipynb new file mode 100644 index 000000000..97a8a20b8 --- /dev/null +++ b/18_convolutional_neural_nets/06_cnn_for_trading_features_to_clustered_image_format.ipynb @@ -0,0 +1,711 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# CNN for Trading - Part 2: From Time-Series Features to Clustered Images" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T16:39:56.783513Z", + "start_time": "2021-02-23T16:39:56.774804Z" + } + }, + "source": [ + "To exploit the grid-like structure of time-series data, we can use CNN architectures for univariate and multivariate time series. In the latter case, we consider different time series as channels, similar to the different color signals.\n", + "\n", + "An alternative approach converts a time series of alpha factors into a two-dimensional format to leverage the ability of CNNs to detect local patterns. [Sezer and Ozbayoglu (2018)](https://www.researchgate.net/publication/324802031_Algorithmic_Financial_Trading_with_Deep_Convolutional_Neural_Networks_Time_Series_to_Image_Conversion_Approach) propose CNN-TA, which computes 15 technical indicators for different intervals and uses hierarchical clustering (see Chapter 13, Data-Driven Risk Factors and Asset Allocation with Unsupervised Learning) to locate indicators that behave similarly close to each other in a two-dimensional grid.\n", + "\n", + "The authors train a CNN similar to the CIFAR-10 example we used earlier to predict whether to buy, hold, or sell an asset on a given day. They compare the CNN performance to \"buy-and-hold\" and other models and find that it outperforms all alternatives using daily price series for Dow 30 stocks and the nine most-traded ETFs over the 2007-2017 time period.\n", + "\n", + "The section on *CNN for Trading* consists of three notebooks that experiment with this approach using daily US equity price data. They demonstrate \n", + "1. How to compute relevant financial features\n", + "2. How to convert a similar set of indicators into image format and cluster them by similarity\n", + "3. How to train a CNN to predict daily returns and evaluate a simple long-short strategy based on the resulting signals." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Selecting and Clustering Features" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The next steps that we will tackle in this notebook are \n", + "1. Select the 15 most relevant features from the 20 candidates to fill the 15×15 input grid.\n", + "2. Apply hierarchical clustering to identify features that behave similarly and order the columns and the rows of the grid accordingly." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Imports & Settings" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T18:59:23.064329Z", + "start_time": "2021-02-23T18:59:23.062565Z" + } + }, + "outputs": [], + "source": [ + "import warnings\n", + "warnings.filterwarnings('ignore')" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T18:59:23.721271Z", + "start_time": "2021-02-23T18:59:23.065808Z" + } + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "from pathlib import Path\n", + "import pandas as pd\n", + "from tqdm import tqdm\n", + "\n", + "from scipy.spatial.distance import pdist\n", + "from scipy.cluster.hierarchy import dendrogram, linkage, cophenet\n", + "\n", + "from sklearn.preprocessing import StandardScaler\n", + "from sklearn.feature_selection import mutual_info_regression\n", + "\n", + "import matplotlib.pyplot as plt\n", + "import seaborn as sns" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T18:59:23.723998Z", + "start_time": "2021-02-23T18:59:23.722349Z" + } + }, + "outputs": [], + "source": [ + "MONTH = 21\n", + "YEAR = 12 * MONTH" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T18:59:23.737796Z", + "start_time": "2021-02-23T18:59:23.724937Z" + } + }, + "outputs": [], + "source": [ + "START = '2001-01-01'\n", + "END = '2017-12-31'" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T18:59:23.750549Z", + "start_time": "2021-02-23T18:59:23.738814Z" + } + }, + "outputs": [], + "source": [ + "sns.set_style('white')\n", + "idx = pd.IndexSlice" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T18:59:23.758227Z", + "start_time": "2021-02-23T18:59:23.751920Z" + } + }, + "outputs": [], + "source": [ + "results_path = Path('results', 'cnn_for_trading')\n", + "if not results_path.exists():\n", + " results_path.mkdir(parents=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load Model Data" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T18:59:30.844081Z", + "start_time": "2021-02-23T18:59:23.759439Z" + } + }, + "outputs": [], + "source": [ + "with pd.HDFStore('data.h5') as store:\n", + " features = store.get('features')\n", + " targets = store.get('targets')" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T18:59:30.896504Z", + "start_time": "2021-02-23T18:59:30.845003Z" + }, + "scrolled": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "MultiIndex: 2378728 entries, ('A', Timestamp('2001-01-02 00:00:00')) to ('ZTS', Timestamp('2017-12-29 00:00:00'))\n", + "Columns: 300 entries, 06_RSI to 85_CMA\n", + "dtypes: float64(300)\n", + "memory usage: 5.3+ GB\n" + ] + } + ], + "source": [ + "features.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T18:59:30.938773Z", + "start_time": "2021-02-23T18:59:30.897581Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "MultiIndex: 2378728 entries, ('A', Timestamp('2001-01-02 00:00:00')) to ('ZTS', Timestamp('2017-12-29 00:00:00'))\n", + "Data columns (total 4 columns):\n", + " # Column Dtype \n", + "--- ------ ----- \n", + " 0 r01_fwd float64\n", + " 1 r01dec_fwd float64\n", + " 2 r05_fwd float64\n", + " 3 r05dec_fwd float64\n", + "dtypes: float64(4)\n", + "memory usage: 81.8+ MB\n" + ] + } + ], + "source": [ + "targets.info()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Select Features using Mutual Information" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To this end, we estimate the mutual information for each indicator and the 15 intervals with respect to our target, the one-day forward returns. As discussed in Chapter 4, Financial Feature Engineering – How to Research Alpha Factors, scikit-learn provides the `mutual_info_regression()` function that makes this straightforward, albeit time-consuming and memory-intensive. \n", + "\n", + "To accelerate the process, we randomly sample 100,000 observations:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T19:20:52.366522Z", + "start_time": "2021-02-23T18:59:30.939773Z" + } + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 2/2 [21:21<00:00, 640.71s/it]\n" + ] + } + ], + "source": [ + "mi = {}\n", + "for t in tqdm([1, 5]):\n", + " target = f'r{t:02}_fwd'\n", + " # sample a smaller number to speed up the computation\n", + " df = features.join(targets[target]).dropna().sample(n=100000)\n", + " X = df.drop(target, axis=1)\n", + " y = df[target]\n", + " mi[t] = pd.Series(mutual_info_regression(X=X, y=y),\n", + " index=X.columns).sort_values(ascending=False)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T19:20:52.376177Z", + "start_time": "2021-02-23T19:20:52.367645Z" + } + }, + "outputs": [], + "source": [ + "mutual_info = pd.DataFrame(mi)\n", + "mutual_info.to_hdf('data.h5', 'mutual_info')" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T19:20:52.391153Z", + "start_time": "2021-02-23T19:20:52.377122Z" + } + }, + "outputs": [], + "source": [ + "mutual_info = pd.read_hdf('data.h5', 'mutual_info')" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T19:20:52.405177Z", + "start_time": "2021-02-23T19:20:52.392078Z" + } + }, + "outputs": [], + "source": [ + "mi_by_indicator = (mutual_info.groupby(mutual_info.\n", + " index.to_series()\n", + " .str.split('_').str[-1])\n", + " .mean()\n", + " .rank(ascending=False)\n", + " .sort_values(by=1))" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T19:20:52.550095Z", + "start_time": "2021-02-23T19:20:52.406148Z" + }, + "scrolled": true + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD4CAYAAADiry33AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAZxUlEQVR4nO3df2wT9/0/8Oc5jkNKMljS2ScgCh9IKgUnkHWqqAQ0wqtJqzQljVOQ1Y2VEIqqCdiHUqCNSPkEkjFU8VGHtPaTkvHjj1pTtShUcT8aqgOxujXdtAGWwFULradUqi8domQOcZxc7vMHwt+vm5BzQuwLb54PCaV39z7f6y7Xpy9v++4taZqmgYiIhGUyugAiIkotBj0RkeAY9EREgmPQExEJjkFPRCS4WRn0W7ZsMboEYYRCIaNLILornp/pMSuD/saNG0aXIIyhoSGjSyC6K56f6TErg56IiGYOg56ISHAMeiIiwTHoiYgEx6AnIhIcg15QHo8HpaWl8X8ej8fokojIIGajC6CZ5/F40NjYiPb2duTn5+P69evxexPcbrfB1RFRuvGKXkAtLS1ob2/H2rVrkZmZibVr16K9vR0tLS1Gl0ZEBmDQCygYDGL16tUJ81avXo1gMGhQRURkJAa9gEpKSrBhwwbMmTMHy5Ytw5w5c7BhwwaUlJQYXRoRGYBBL6CFCxeis7MT9fX16O3tRX19PTo7O7Fw4UKjSyMiA0izcSjB2tpadHR0GF3GfWvOnDmYM2cObt68GZ83b948RKNRRKNRAysjShQMBvmXZhrwWzcCGh4exvDwMHJychCJRJCTk5MQ+kT0YGHXjaBMJhMefvhhSJKEhx9+GCYTf9VEDyr+3y+osbGx+FX8zZs3MTY2ZnBFRGQUdt0I7M5z/fl8f6IHW1JX9H6/H5WVlXA6nWhraxu3/Nq1a9i4cSNKS0vR3t4+brmqqqipqcG2bdvuvWJKmt1ux0cffQS73W50KURkIN0relVV0dzcjBMnTsBms6Gurg4OhwNFRUXxNvPnz0djYyN8Pt+Er3H69GksXboUkUhk5ionXZcvX8aTTz5pdBlEZDDdK/pAIIDCwkIUFBTAYrGgqqpqXKDn5+dj+fLlMJvHv2+Ew2GcP38edXV1M1c16ZIkCbIsw2QyQZZlSJJkdElEZBDdK3pFUSDLcnzaZrMhEAgkvYHW1la8+uqrGBwcTHqdWCzG2/XvUWZmJjIyMgAAGRkZyMzM5HGlWScajfKcnCGT3Y+gG/QT3U+V7NXhuXPnkJeXh9LSUnz66adJrQMAFouFN1HcA0mSMDIygtHRUYyNjWF0dBQjIyOQJInHlWYV3jCVHrpdN7IsIxwOx6cVRYHVak3qxf/xj3+gu7sbDocDu3btQm9vL3bv3j39aikpy5Ytw+LFi6EoCoDbv7PFixdj2bJlBldGdBvHS0gv3aAvKytDKBRCX18fYrEYvF4vHA5HUi/+yiuvwO/3o7u7G0ePHsXjjz+ON998856LpsktXLgQX331FV5++WX09vbi5ZdfxldffcVn3dCscGe8hGPHjuHChQs4duwYGhsbGfappCXh/Pnz2rp167Sf/vSn2u9+9ztN0zTtvffe09577z1N0zStv79fW7NmjfbjH/9Y+8lPfqKtWbNG+/e//53wGr29vdpLL72UzOa05557Lql2NLGsrCzthRde0Ox2u2YymTS73a698MILWlZWltGlEWl2u13r7u7WNE3Trly5ommapnV3d2t2u93IsoTGh5oJSJIkDA4O4qGHHor3gd66dQtz586d8DMXonTKyMjAyZMn8Zvf/CZ+fu7duxcvvvgiVFU1ujwh8REIAsrKysI777yTMO+dd95BVlaWQRUR/T8LFizAtm3b8Pnnn2NsbAyff/45tm3bhgULFhhdmrAY9ALaunUr9u7di6NHj+LWrVs4evQo9u7di61btxpdGhFu3LiBoaEhNDQ0oLe3Fw0NDRgaGuKjOlKIXTeC2r59O959910MDw8jKysLW7duxbFjx4wuiyh+M9///22+O9OzMI6EwKAXHL+nTLPNnftwTCYTxsbG4j+Bie/boXvHrhsiMsS8efNgMpkwb948o0sRHh9TTESG4GO004dX9ERkiDujnnH0s9TjESYiQ2zbtg29vb0cpyIN2HVDRIZ4++238fbbbxtdxgOBV/RERIJj0BNRWt25Qzs3Nxcmkwm5ubkJ82nmMeiJKK2Gh4fx6KOPIhKJYGxsDJFIBI8++iiGh4eNLk1YDHoiSrvDhw9jbGwMV65cwdjYGA4fPmx0SUJj0BNRWi1atAibNm3CuXPnMDIygnPnzmHTpk1YtGiR0aUJi0FPRGl15MgRqKqK+vp6lJeXo76+Hqqq4siRI0aXJiwGPRGlldvtxltvvYW5c+dCkiTMnTsXb731Ftxut9GlCYvfoyeitHO73XC73XzoXprwip6ISHBJBb3f70dlZSWcTifa2trGLb927Ro2btyI0tJStLe3x+d/8803+PnPf46nn34aVVVVOHXq1MxVTkRESdHtulFVFc3NzThx4gRsNhvq6urgcDhQVFQUbzN//nw0NjbC5/MlrJuRkYF9+/bBbrcjEonA5XJh1apVCesSEVFq6V7RBwIBFBYWoqCgABaLBVVVVeMCPT8/H8uXL4fZnPi+YbVaYbfbAQA5OTlYsmQJFEWZwfKJiEiP7hW9oiiQZTk+bbPZEAgEpryhr7/+GsFgECtWrNBtG4vFEAwGp7wNGi8ajfJY0qzF83PmTPahtm7QTzS0152hwJI1ODiIHTt24PXXX0dOTo5ue4vFwk/iZwi/1UCzGc/P9NDtuvn+IL6KosBqtSa9gZGREezYsQPV1dVYt27d9KokIqJp0w36srIyhEIh9PX1IRaLwev1wuFwJPXimqahsbERS5YswebNm++5WCIimjrdrhuz2YympiY0NDRAVVW4XC4UFxfD4/EAuH3jw7fffguXy4VIJAKTyYRTp07hww8/xGeffYYzZ87gkUcewfr16wEAu3btQkVFRWr3ioiI4iRtok54g9XW1qKjo8PoMoTAPlCazXh+pgfvjCUiEhyDnohIcAx6IiLBMeiJiATHoCciEhyDnohIcAx6IiLBMeiJiATHoCciEhyDnohIcAx6IiLBMeiJiATHoCciEhyDnohIcAx6IiLBMeiJiATHoCciEhyDnohIcEkFvd/vR2VlJZxOJ9ra2sYtv3btGjZu3IjS0lK0t7dPaV0iIkot3aBXVRXNzc04fvw4vF4vurq6cPXq1YQ28+fPR2NjI7Zs2TLldYmIKLV0gz4QCKCwsBAFBQWwWCyoqqqCz+dLaJOfn4/ly5fDbDZPeV0iIkots14DRVEgy3J82mazIRAIJPXi0103FoshGAwmtQ2aXDQa5bGkWYvn58wpKSm56zLdoNc0bdw8SZKS2vB017VYLJMWTckLBoM8ljRr8fxMD92uG1mWEQ6H49OKosBqtSb14veyLhERzQzdoC8rK0MoFEJfXx9isRi8Xi8cDkdSL34v6xIR0czQ7boxm81oampCQ0MDVFWFy+VCcXExPB4PAMDtduPbb7+Fy+VCJBKByWTCqVOn8OGHHyInJ2fCdYmIKH0kbaKOdIPV1taio6PD6DKEwD5Qms14fqYH74wlIhIcg56ISHAMeiIiwTHoiYgEx6AnIhIcg56ISHAMeiIiwTHoiYgEx6AnIhIcg56ISHAMeiIiwTHoiYgEx6AnIhIcg56ISHAMeiIiwTHoiYgEx6AnIhJcUkHv9/tRWVkJp9OJtra2ccs1TcOhQ4fgdDpRXV2Ny5cvx5edPHkSVVVVeOaZZ7Br1y4MDw/PXPVERKRLN+hVVUVzczOOHz8Or9eLrq4uXL16NaGN3+9HKBTC2bNncfDgQRw4cAAAoCgKTp8+jT/+8Y/o6uqCqqrwer0p2REiIpqYbtAHAgEUFhaioKAAFosFVVVV8Pl8CW18Ph9qamogSRLKy8sxMDCA/v5+ALffKKLRKEZHRxGNRmG1WlOzJ0RENCGzXgNFUSDLcnzaZrMhEAhM2kaWZSiKgrKyMtTX12Pt2rXIysrCqlWrsHr1at2iYrEYgsHgVPaD7iIajfJY0qzF83PmTDbIum7Qa5o2bp4kSUm1uXnzJnw+H3w+H3Jzc7Fz506cOXMG69evn3SbFouFI8PPkGAwyGNJsxbPz/TQ7bqRZRnhcDg+rSjKuO6X77cJh8OwWq34y1/+gkWLFiEvLw+ZmZlYt24dLly4MIPlExGRHt2gLysrQygUQl9fH2KxGLxeLxwOR0Ibh8OBzs5OaJqGixcvIjc3F1arFQsWLMClS5cwNDQETdPwySefYOnSpSnbGSIiGk+368ZsNqOpqQkNDQ1QVRUulwvFxcXweDwAALfbjYqKCvT09MDpdCI7Oxutra0AgBUrVqCyshLPPfcczGYzSkpKsHHjxtTuERERJZC0iTrYDVZbW4uOjg6jyxAC+0BpNuP5mR68M5aISHAMeiIiwTHoiYgEx6AnIhIcg56ISHAMeiIiwTHoiYgEx6AnIhIcg56ISHAMeiIiwTHoiYgEx6AnIhIcg56ISHAMeiIiwTHoiYgEx6AnIhIcg56I0s7j8aC0tDT+786IdZQaukMJEhHNJI/Hg8bGRrS3tyM/Px/Xr1/Hli1bANwempRmXlJX9H6/H5WVlXA6nWhraxu3XNM0HDp0CE6nE9XV1bh8+XJ82cDAAHbs2IGnnnoKTz/9NC5cuDBz1RPRfaelpQXt7e1Yu3YtMjMzsXbtWrS3t6OlpcXo0oSle0Wvqiqam5tx4sQJ2Gw21NXVweFwoKioKN7G7/cjFArh7NmzuHTpEg4cOID3338fwO1f6po1a/Db3/4WsVgM0Wg0dXtDRLNeMBjE6tWrE+atXr0awWDQoIrEp3tFHwgEUFhYiIKCAlgsFlRVVcHn8yW08fl8qKmpgSRJKC8vx8DAAPr7+xGJRPC3v/0NdXV1AACLxYIf/OAHqdmTB1xpaSkkSRr3b9myZRPOlyQJpaWlRpdND6CSkhJ8/PHHCfM+/vhjDhKeQrpX9IqiQJbl+LTNZkMgEJi0jSzLUBQFZrMZeXl5eO211/DZZ5/BbrejsbERDz300KTbjMVifHefojt/QX3f06e+xP/+Ysld1+NxpnR78cUXsWnTJhw8eBDLli3DyZMnsX//fuzcuZPn4z2Y7I1SN+g1TRs3T5KkpNqMjo7iypUr2L9/P1asWIFDhw6hra0Nv/rVrybdpsVi4bv7jPmSx5JmlZKSEixcuBAtLS0IBoMoKSnBkSNH+EFsCul23ciyjHA4HJ9WFAVWq3XSNuFwGFarFbIsQ5ZlrFixAgDw1FNP4cqVKzNVOxERJUE36MvKyhAKhdDX14dYLAav1wuHw5HQxuFwoLOzE5qm4eLFi8jNzYXVasWPfvQjyLKML7/8EgDwySefYOnSpanZEyK6L3g8HuzcuRODg4MAgMHBQezcuZPfpU8h3a4bs9mMpqYmNDQ0QFVVuFwuFBcXx38pbrcbFRUV6OnpgdPpRHZ2NlpbW+Pr79+/H7t378bIyAgKCgrw61//OnV7Q0Sz3p49ezAyMpIwb2RkBHv27GH3TYpI2kQd7Aarra1FR0eH0WUIYfE+L0KHq4wugyhOkiTMnz8f8+fPxz//+U8UFhbiu+++w3fffTfh531073hnLBGlnSRJ+P3vfx+/M9blchldktAY9ESUdsPDw6ivr49f0Q8PDxtdktAY9ESUdrdu3UIoFAKA+E9KHT69kojS6vv34ejNp3vHoCeitLrbB678IDZ1GPRERIJj0BORIUwmU8JPSh0eYSIyxNjYWMJPSh0GPRGR4Bj0RESCY9ATEQmOQU9EJDgGPRGR4Bj0RESCY9ATEQmOQU9EJDgGPRGR4JIKer/fj8rKSjidTrS1tY1brmkaDh06BKfTierqaly+fDlhuaqqqKmpwbZt22amaiK67+Xk5CT8pNTRDXpVVdHc3Izjx4/D6/Wiq6sLV69eTWjj9/sRCoVw9uxZHDx4EAcOHEhYfvr0aQ4KTkQJhoaGEn5S6ugGfSAQQGFhIQoKCmCxWFBVVQWfz5fQxufzoaamBpIkoby8HAMDA+jv7wcAhMNhnD9/HnV1danZAyK672RmZiY81CwzM9PgisSmG/SKokCW5fi0zWaDoiiTtpFlOd6mtbUVr776Kp9QR0QAbgf76Ogo8vLyAAB5eXkYHR1lRqSQ7lCCEw0G8P2RYO7W5ty5c8jLy0NpaSk+/fTTpIuKxWIIBoNJt6fJ8ViSkZ599tlx3b0A4heDd35qmhbPlqKiInzwwQfpK1IAJSUld12mG/SyLCMcDsenFUWB1WqdtE04HIbVasWf/vQndHd3w+/3Y3h4GJFIBLt378abb7456TYtFsukRdNUfMljSYb64osvxs3bvn073n33XQwPDyMrKwtbt27FsWPHDKjuwaD7t1JZWRlCoRD6+voQi8Xg9XrhcDgS2jgcDnR2dkLTNFy8eBG5ubmwWq145ZVX4Pf70d3djaNHj+Lxxx/XDXkiEt+xY8cQjUZRuLcL0WiUIZ9iulf0ZrMZTU1NaGhogKqqcLlcKC4uhsfjAQC43W5UVFSgp6cHTqcT2dnZaG1tTXnhRESUHN2gB4CKigpUVFQkzHO73fH/liQJb7zxxqSvsXLlSqxcuXIaJRIR0b1IKuhp9ljxX2dxc2hkSuss3uedUvt52Zm49Ma6Ka1DRLMXg/4+c3NoBKHDVUm3DwaDU/4wdqpvDEQ0u/GLq0REgmPQExEJjkFPRCQ4Bj0RkeAY9EREgmPQExEJjkFPRCQ4Bj0RkeAY9EREgmPQExEJjkFPRCQ4Bj0RkeAY9EREgmPQExEJjkFPRCQ4Bj0RkeCSGnjE7/ejpaUFY2NjeP755/HSSy8lLNc0DS0tLejp6cGcOXNw+PBh2O12fPPNN9izZw/+9a9/wWQyYcOGDfjFL36Rkh0hImNNZ/QzYGoD3XD0s+nRDXpVVdHc3IwTJ07AZrOhrq4ODocDRUVF8TZ+vx+hUAhnz57FpUuXcODAAbz//vvIyMjAvn37YLfbEYlE4HK5sGrVqoR1iUgMUx39DJj6CGgc/Wx6dLtuAoEACgsLUVBQAIvFgqqqKvh8voQ2Pp8PNTU1kCQJ5eXlGBgYQH9/P6xWK+x2OwAgJycHS5YsgaIoqdkTIiKakO4VvaIokGU5Pm2z2RAIBCZtI8syFEWB1WqNz/v6668RDAaxYsUK3aJisRiCwWBSO/AgmsqxiUaj0zqWPP40HVM9b6ZzfvLcnNhkfxnpBr2maePmSZI0pTaDg4PYsWMHXn/9deTk5OhtEhaLZcoDWj84vpzSsZnO4OBT3QbRbVM/b6Z+fvLcnA7drhtZlhEOh+PT379Sn6hNOByOtxkZGcGOHTtQXV2Ndev4IQoRUbrpBn1ZWRlCoRD6+voQi8Xg9XrhcDgS2jgcDnR2dkLTNFy8eBG5ubmwWq3QNA2NjY1YsmQJNm/enLKdICKiu9PtujGbzWhqakJDQwNUVYXL5UJxcTE8Hg8AwO12o6KiAj09PXA6ncjOzkZraysA4O9//zvOnDmDRx55BOvXrwcA7Nq1CxUVFSncJSIywkP/8d8oO7Vv6iv+dSrbsAGY2jd7CJC0iTrYDVZbW4uOjg6jy5iVlv3Pk8iYk9pvLqlRG65s+yil2yDxLN7nTcvXK6e6DUryhimaPW599Z9TOtGn82Esv6tMJBY+AoGISHAMeiIiwTHoiYgEx6AnIhIcg56ISHAMeiIiwTHoiYgEx+/RE9GMmd49GF8m3XJeduY0Xp8Y9EQ0I6ZzxyrvdE0PBv19aOpXTclfMQG8aiISDYP+PjPVqx9eMRERP4wlIhIcg56ISHAMeiIiwTHoiYgEx6AnIhIcg56ISHBJBb3f70dlZSWcTifa2trGLdc0DYcOHYLT6UR1dTUuX76c9LpERJRaukGvqiqam5tx/PhxeL1edHV14erVqwlt/H4/QqEQzp49i4MHD+LAgQNJr0tERKmlG/SBQACFhYUoKCiAxWJBVVUVfD5fQhufz4eamhpIkoTy8nIMDAygv78/qXWJiCi1dO+MVRQFsizHp202GwKBwKRtZFmGoihJrTuRWCyGYDCY1A7Qbc8+++xd/1qSfjPxOkVFRfjggw9SWBXR5OcmMPH5yXNz6kpKSu66TDfoNU0bN0+SpKTaJLPuRCwWy6RF03hffPHFhPODwSCPJRnqbucmwPMzXXSDXpZlhMPh+LSiKLBarZO2CYfDsFqtGBkZ0V2XiIhSS7ePvqysDKFQCH19fYjFYvB6vXA4HAltHA4HOjs7oWkaLl68iNzcXFit1qTWJSKi1NK9ojebzWhqakJDQwNUVYXL5UJxcTE8Hg8AwO12o6KiAj09PXA6ncjOzkZra+uk6xIRUfpI2kQd6Qarra1FR0eH0WUIgX2gNJvx/EwP3hlLRCQ4Bj0RkeAY9EREgmPQExEJblZ+GLty5UosXLjQ6DKIiO4bP/zhD9He3j7hslkZ9ERENHPYdUNEJDgGPRGR4Bj0RESCY9ATEQmOQU9EJDgGPRGR4HSfXkn3p9deew3nz59Hfn4+urq6jC6HKIHD4cDcuXNhMpmQkZHBhximGINeULW1tfjZz36GvXv3Gl0K0YROnTqFvLw8o8t4ILDrRlCPPfYY5s2bZ3QZRDQLMOiJyBBbtmxBbW0t/vCHPxhdivDYdUNEaefxeGCz2XD9+nVs3rwZS5YswWOPPWZ0WcLiFT0RpZ3NZgMA5Ofnw+l0IhAIGFyR2Bj0RJRWt27dQiQSif/3n//8Z44lnWLsuhHUrl278Ne//hU3btzAE088ge3bt+P55583uiwiXL9+Hb/85S8BAKqq4plnnsETTzxhcFVi42OKiYgEx64bIiLBMeiJiATHoCciEhyDnohIcAx6IiLBMeiJiATHoCciEtz/AQPTVZfpmg20AAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "mutual_info.boxplot()\n", + "sns.despine();" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The below figure shows the mutual information, averaged across the 15 intervals for each indicator. NATR, PPO, and Bollinger Bands are most important from this metric's perspective:" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T19:20:52.937686Z", + "start_time": "2021-02-23T19:20:52.551300Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "(mutual_info.groupby(mutual_info.index.to_series().str.split('_').str[-1])[1]\n", + " .mean()\n", + " .sort_values().plot.barh(title='Mutual Information with 1-Day Forward Returns'))\n", + "sns.despine()\n", + "plt.tight_layout()\n", + "plt.savefig(results_path / 'mutual_info_cnn_features', dpi=300)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T19:20:52.941987Z", + "start_time": "2021-02-23T19:20:52.938868Z" + } + }, + "outputs": [], + "source": [ + "best_features = mi_by_indicator.head(15).index" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T19:20:52.971896Z", + "start_time": "2021-02-23T19:20:52.943614Z" + } + }, + "outputs": [], + "source": [ + "size = len(best_features)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Hierarchical Feature Clustering" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T19:20:56.199036Z", + "start_time": "2021-02-23T19:20:52.973214Z" + } + }, + "outputs": [], + "source": [ + "features = pd.concat([features.filter(like=f'_{f}') for f in best_features], axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T19:21:00.180174Z", + "start_time": "2021-02-23T19:20:56.200032Z" + } + }, + "outputs": [], + "source": [ + "new_cols = {}\n", + "for feature in best_features:\n", + " fnames = sorted(features.filter(like=f'_{feature}').columns.tolist())\n", + " renamed = [f'{i:02}_{feature}' for i in range(1, len(fnames)+ 1)]\n", + " new_cols.update(dict(zip(fnames, renamed)))\n", + "features = features.rename(columns=new_cols).sort_index(1)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T19:21:00.194831Z", + "start_time": "2021-02-23T19:21:00.181519Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "MultiIndex: 2378728 entries, ('A', Timestamp('2001-01-02 00:00:00')) to ('ZTS', Timestamp('2017-12-29 00:00:00'))\n", + "Columns: 225 entries, 01_BBH to 15_WMA\n", + "dtypes: float64(225)\n", + "memory usage: 4.1+ GB\n" + ] + } + ], + "source": [ + "features.info()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Hierarchical Clustering" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As discussed in the first section of this chapter, CNNs rely on the locality of relevant patterns that is typically found in images where nearby pixels are closely related and changes from one pixel to the next are often gradual.\n", + "\n", + "To organize our indicators in a similar fashion, we will follow Sezer and Ozbayoglu's approach of applying hierarchical clustering. The goal is to identify features that behave similarly and order the columns and the rows of the grid accordingly.\n", + "\n", + "We can build on SciPy's `pairwise_distance()`, `linkage()`, and `dendrogram()` functions that we introduced in *Chapter 13, Data-Driven Risk Factors and Asset Allocation with Unsupervised Learning* alongside other forms of clustering. \n", + "\n", + "We create a helper function that standardizes the input column-wise to avoid distorting distances among features due to differences in scale, and use the Ward criterion that merges clusters to minimize variance. The function\n", + "returns the order of the leaf nodes in the dendrogram that in turn displays the successive formation of larger clusters:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T19:21:00.214677Z", + "start_time": "2021-02-23T19:21:00.196205Z" + } + }, + "outputs": [], + "source": [ + "def cluster_features(data, labels, ax, title):\n", + " data = StandardScaler().fit_transform(data)\n", + " pairwise_distance = pdist(data)\n", + " Z = linkage(data, 'ward')\n", + " c, coph_dists = cophenet(Z, pairwise_distance)\n", + " dend = dendrogram(Z,\n", + " labels=labels,\n", + " orientation='top',\n", + " leaf_rotation=0.,\n", + " leaf_font_size=8.,\n", + " ax=ax)\n", + " ax.set_title(title)\n", + " return dend['ivl']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To obtain the optimized order of technical indicators in the columns and the different intervals in the rows, we use NumPy's `.reshape()` method to ensure that the dimension we would like to cluster appears in the columns of the two-dimensional array we pass to `cluster_features()`." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T19:21:57.861182Z", + "start_time": "2021-02-23T19:21:00.215792Z" + } + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "fig, axes = plt.subplots(figsize=(15, 4), ncols=2)\n", + "\n", + "labels = sorted(best_features)\n", + "title = 'Column Features: Indicators'\n", + "col_order = cluster_features(features.dropna().values.reshape(-1, 15).T,\n", + " labels,\n", + " axes[0],\n", + " title)\n", + "\n", + "labels = list(range(1, 16))\n", + "title = 'Row Features: Indicator Parameters'\n", + "row_order = cluster_features(\n", + " features.dropna().values.reshape(-1, 15, 15).transpose((0, 2, 1)).reshape(-1, 15).T,\n", + " labels, axes[1], title)\n", + "axes[0].set_xlabel('Indicators')\n", + "axes[1].set_xlabel('Parameters')\n", + "sns.despine()\n", + "fig.tight_layout()\n", + "fig.savefig(results_path / 'cnn_clustering', dpi=300)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We reorder the features accordingly and store the result as inputs for the CNN that we will create in the next step.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T19:21:57.869035Z", + "start_time": "2021-02-23T19:21:57.863039Z" + } + }, + "outputs": [], + "source": [ + "feature_order = [f'{i:02}_{j}' for i in row_order for j in col_order]" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T19:21:58.596134Z", + "start_time": "2021-02-23T19:21:57.871605Z" + } + }, + "outputs": [], + "source": [ + "features = features.loc[:, feature_order]" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T19:22:56.029721Z", + "start_time": "2021-02-23T19:21:58.597104Z" + } + }, + "outputs": [], + "source": [ + "features = features.apply(pd.to_numeric, downcast='float')" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T19:22:56.069245Z", + "start_time": "2021-02-23T19:22:56.030777Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "MultiIndex: 2378728 entries, ('A', Timestamp('2001-01-02 00:00:00')) to ('ZTS', Timestamp('2017-12-29 00:00:00'))\n", + "Columns: 225 entries, 01_CMO to 11_WMA\n", + "dtypes: float32(225)\n", + "memory usage: 2.0+ GB\n" + ] + } + ], + "source": [ + "features.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-23T19:22:58.021376Z", + "start_time": "2021-02-23T19:22:56.070350Z" + } + }, + "outputs": [], + "source": [ + "features.to_hdf('data.h5', 'img_data')" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.5" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": true, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": true + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/20_autoencoders_for_conditional_risk_factors/05_conditional_autoencoder_for_asset_pricing_data.ipynb b/20_autoencoders_for_conditional_risk_factors/05_conditional_autoencoder_for_asset_pricing_data.ipynb new file mode 100644 index 000000000..179ad9c6d --- /dev/null +++ b/20_autoencoders_for_conditional_risk_factors/05_conditional_autoencoder_for_asset_pricing_data.ipynb @@ -0,0 +1,1893 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Conditional Autoencoder for Asset Pricing - Part 1: The Data" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:07:31.951910Z", + "start_time": "2021-02-24T15:07:31.351143Z" + } + }, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "\n", + "import numpy as np\n", + "import pandas as pd\n", + "\n", + "from statsmodels.regression.rolling import RollingOLS\n", + "import statsmodels.api as sm\n", + "import matplotlib.pyplot as plt\n", + "import seaborn as sns\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:07:31.954639Z", + "start_time": "2021-02-24T15:07:31.953021Z" + } + }, + "outputs": [], + "source": [ + "idx = pd.IndexSlice\n", + "sns.set_style('whitegrid')" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:07:31.963074Z", + "start_time": "2021-02-24T15:07:31.955671Z" + } + }, + "outputs": [], + "source": [ + "results_path = Path('results', 'asset_pricing')\n", + "if not results_path.exists():\n", + " results_path.mkdir(parents=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load Data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Prices" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:07:33.621475Z", + "start_time": "2021-02-24T15:07:31.963898Z" + } + }, + "outputs": [], + "source": [ + "prices = pd.read_hdf(results_path / 'data.h5', 'stocks/prices/adjusted')" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:07:33.932624Z", + "start_time": "2021-02-24T15:07:33.622337Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "MultiIndex: 17661451 entries, ('A', Timestamp('1999-11-18 00:00:00')) to ('ZYXI', Timestamp('2019-12-31 00:00:00'))\n", + "Data columns (total 5 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 close 17661451 non-null float64\n", + " 1 high 17661451 non-null float64\n", + " 2 low 17661451 non-null float64\n", + " 3 open 17661451 non-null float64\n", + " 4 volume 17661451 non-null float64\n", + "dtypes: float64(5)\n", + "memory usage: 742.0+ MB\n" + ] + } + ], + "source": [ + "prices.info(show_counts=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Metadata" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:07:33.997645Z", + "start_time": "2021-02-24T15:07:33.933520Z" + } + }, + "outputs": [], + "source": [ + "metadata = pd.read_hdf(results_path / 'data.h5', 'stocks/info').rename(columns=str.lower)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:07:34.006833Z", + "start_time": "2021-02-24T15:07:33.998994Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Index: 6262 entries, A to ZYXI\n", + "Columns: 109 entries, zip to impliedsharesoutstanding\n", + "dtypes: bool(2), float64(75), int64(3), object(29)\n", + "memory usage: 5.2+ MB\n" + ] + } + ], + "source": [ + "metadata.info()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Select tickers with metadata" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:07:34.019711Z", + "start_time": "2021-02-24T15:07:34.007871Z" + } + }, + "outputs": [], + "source": [ + "sectors = (metadata.sector.value_counts() > 50).index" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:07:34.028440Z", + "start_time": "2021-02-24T15:07:34.020656Z" + } + }, + "outputs": [], + "source": [ + "tickers_with_errors = ['FTAI', 'AIRT', 'CYBR', 'KTB']" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:07:34.042811Z", + "start_time": "2021-02-24T15:07:34.029471Z" + } + }, + "outputs": [], + "source": [ + "tickers_with_metadata = metadata[metadata.sector.isin(sectors) & \n", + " metadata.marketcap.notnull() &\n", + " metadata.sharesoutstanding.notnull() & \n", + " (metadata.sharesoutstanding > 0)].index.drop(tickers_with_errors)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:07:34.056060Z", + "start_time": "2021-02-24T15:07:34.044017Z" + } + }, + "outputs": [], + "source": [ + "metadata = metadata.loc[tickers_with_metadata, ['sector', 'sharesoutstanding', 'marketcap']]\n", + "metadata.index.name = 'ticker'" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:30:43.035689Z", + "start_time": "2021-02-24T15:07:34.057118Z" + } + }, + "outputs": [], + "source": [ + "prices = prices.loc[idx[tickers_with_metadata, :], :]" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:30:43.353647Z", + "start_time": "2021-02-24T15:30:43.036510Z" + } + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + ":1: FutureWarning: null_counts is deprecated. Use show_counts instead\n", + " prices.info(null_counts=True)\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "MultiIndex: 17312229 entries, ('A', Timestamp('1999-11-18 00:00:00')) to ('ZYXI', Timestamp('2019-12-31 00:00:00'))\n", + "Data columns (total 5 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 close 17312229 non-null float64\n", + " 1 high 17312229 non-null float64\n", + " 2 low 17312229 non-null float64\n", + " 3 open 17312229 non-null float64\n", + " 4 volume 17312229 non-null float64\n", + "dtypes: float64(5)\n", + "memory usage: 727.4+ MB\n" + ] + } + ], + "source": [ + "prices.info(null_counts=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:30:43.361769Z", + "start_time": "2021-02-24T15:30:43.354651Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Index: 5749 entries, A to ZYXI\n", + "Data columns (total 3 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 sector 5749 non-null object \n", + " 1 sharesoutstanding 5749 non-null float64\n", + " 2 marketcap 5749 non-null float64\n", + "dtypes: float64(2), object(1)\n", + "memory usage: 179.7+ KB\n" + ] + } + ], + "source": [ + "metadata.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:30:49.145765Z", + "start_time": "2021-02-24T15:30:43.362775Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "DatetimeIndex: 7559 entries, 1990-01-02 to 2019-12-31\n", + "Columns: 4420 entries, A to ZYXI\n", + "dtypes: float64(4420)\n", + "memory usage: 255.0 MB\n" + ] + } + ], + "source": [ + "close = prices.close.unstack('ticker').sort_index()\n", + "close.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:30:55.056033Z", + "start_time": "2021-02-24T15:30:49.146672Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "DatetimeIndex: 7559 entries, 1990-01-02 to 2019-12-31\n", + "Columns: 4420 entries, A to ZYXI\n", + "dtypes: float64(4420)\n", + "memory usage: 255.0 MB\n" + ] + } + ], + "source": [ + "volume = prices.volume.unstack('ticker').sort_index()\n", + "volume.info()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create weekly returns" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:01.377951Z", + "start_time": "2021-02-24T15:30:55.057292Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "DatetimeIndex: 1565 entries, 1990-01-12 to 2020-01-03\n", + "Freq: W-FRI\n", + "Columns: 4420 entries, A to ZYXI\n", + "dtypes: float64(4420)\n", + "memory usage: 52.8 MB\n" + ] + } + ], + "source": [ + "returns = (prices.close\n", + " .unstack('ticker')\n", + " .resample('W-FRI').last()\n", + " .sort_index().pct_change().iloc[1:])\n", + "returns.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:01.380485Z", + "start_time": "2021-02-24T15:31:01.378804Z" + } + }, + "outputs": [], + "source": [ + "dates = returns.index" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:01.581772Z", + "start_time": "2021-02-24T15:31:01.381581Z" + } + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/stefan/.pyenv/versions/miniconda3-latest/envs/ml4t-dl/lib/python3.8/site-packages/seaborn/distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).\n", + " warnings.warn(msg, FutureWarning)\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "sns.distplot(returns.count(1), kde=False);" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:01.874242Z", + "start_time": "2021-02-24T15:31:01.582715Z" + } + }, + "outputs": [], + "source": [ + "with pd.HDFStore(results_path / 'autoencoder.h5') as store:\n", + " store.put('close', close)\n", + " store.put('volume', volume)\n", + " store.put('returns', returns)\n", + " store.put('metadata', metadata)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Factor Engineering" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:01.876666Z", + "start_time": "2021-02-24T15:31:01.875143Z" + } + }, + "outputs": [], + "source": [ + "MONTH = 21" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Price Trend" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Short-Term Reversal" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1-month cumulative return" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:01.885711Z", + "start_time": "2021-02-24T15:31:01.878828Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "DatetimeIndex(['1990-01-12', '1990-01-19', '1990-01-26', '1990-02-02',\n", + " '1990-02-09'],\n", + " dtype='datetime64[ns]', name='date', freq='W-FRI')" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dates[:5]" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:02.511175Z", + "start_time": "2021-02-24T15:31:01.887161Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "MultiIndex: 3580621 entries, (Timestamp('1990-02-02 00:00:00', freq='W-FRI'), 'AA') to (Timestamp('2020-01-03 00:00:00', freq='W-FRI'), 'ZYXI')\n", + "Data columns (total 1 columns):\n", + " # Column Dtype \n", + "--- ------ ----- \n", + " 0 mom1m float64\n", + "dtypes: float64(1)\n", + "memory usage: 41.2+ MB\n" + ] + } + ], + "source": [ + "mom1m = close.pct_change(periods=MONTH).resample('W-FRI').last().stack().to_frame('mom1m')\n", + "mom1m.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:02.540510Z", + "start_time": "2021-02-24T15:31:02.512142Z" + } + }, + "outputs": [], + "source": [ + "mom1m.squeeze().to_hdf(results_path / 'autoencoder.h5', 'factor/mom1m')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Stock Momentum" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "11-month cumulative returns ending 1-month before month end" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:03.146297Z", + "start_time": "2021-02-24T15:31:02.541311Z" + } + }, + "outputs": [], + "source": [ + "mom12m = (close\n", + " .pct_change(periods=11 * MONTH)\n", + " .shift(MONTH)\n", + " .resample('W-FRI')\n", + " .last()\n", + " .stack()\n", + " .to_frame('mom12m'))" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:03.196560Z", + "start_time": "2021-02-24T15:31:03.147146Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "MultiIndex: 3375489 entries, (Timestamp('1991-01-04 00:00:00', freq='W-FRI'), 'AA') to (Timestamp('2020-01-03 00:00:00', freq='W-FRI'), 'ZYXI')\n", + "Data columns (total 1 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 mom12m 3375489 non-null float64\n", + "dtypes: float64(1)\n", + "memory usage: 38.8+ MB\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + ":1: FutureWarning: null_counts is deprecated. Use show_counts instead\n", + " mom12m.info(null_counts=True)\n" + ] + } + ], + "source": [ + "mom12m.info(null_counts=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:03.225828Z", + "start_time": "2021-02-24T15:31:03.197502Z" + } + }, + "outputs": [], + "source": [ + "mom12m.to_hdf(results_path / 'autoencoder.h5', 'factor/mom12m')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Momentum Change" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Cumulative return from months t-6 to t-1 minus months t-12 to t-7." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:04.175407Z", + "start_time": "2021-02-24T15:31:03.226841Z" + } + }, + "outputs": [], + "source": [ + "chmom = (close\n", + " .pct_change(periods=6 * MONTH)\n", + " .sub(close.pct_change(periods=6 * MONTH).shift(6 * MONTH))\n", + " .resample('W-FRI')\n", + " .last()\n", + " .stack()\n", + " .to_frame('chmom'))" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:04.224124Z", + "start_time": "2021-02-24T15:31:04.176245Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "MultiIndex: 3375489 entries, (Timestamp('1991-01-04 00:00:00', freq='W-FRI'), 'AA') to (Timestamp('2020-01-03 00:00:00', freq='W-FRI'), 'ZYXI')\n", + "Data columns (total 1 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 chmom 3375489 non-null float64\n", + "dtypes: float64(1)\n", + "memory usage: 38.8+ MB\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + ":1: FutureWarning: null_counts is deprecated. Use show_counts instead\n", + " chmom.info(null_counts=True)\n" + ] + } + ], + "source": [ + "chmom.info(null_counts=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:04.253178Z", + "start_time": "2021-02-24T15:31:04.225020Z" + } + }, + "outputs": [], + "source": [ + "chmom.to_hdf(results_path / 'autoencoder.h5', 'factor/chmom')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Industry Momentum" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Equal-weighted avg. industry 12-month returns" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:05.205923Z", + "start_time": "2021-02-24T15:31:04.254036Z" + } + }, + "outputs": [], + "source": [ + "indmom = (close.pct_change(12*MONTH)\n", + " .resample('W-FRI')\n", + " .last()\n", + " .stack()\n", + " .to_frame('close')\n", + " .join(metadata[['sector']]).groupby(['date', 'sector'])\n", + " .close.mean()\n", + " .to_frame('indmom')\n", + " .reset_index())" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:05.214051Z", + "start_time": "2021-02-24T15:31:05.206759Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 18495 entries, 0 to 18494\n", + "Data columns (total 3 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 date 18495 non-null datetime64[ns]\n", + " 1 sector 18495 non-null object \n", + " 2 indmom 18495 non-null float64 \n", + "dtypes: datetime64[ns](1), float64(1), object(1)\n", + "memory usage: 433.6+ KB\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + ":1: FutureWarning: null_counts is deprecated. Use show_counts instead\n", + " indmom.info(null_counts=True)\n" + ] + } + ], + "source": [ + "indmom.info(null_counts=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:06.153300Z", + "start_time": "2021-02-24T15:31:05.215068Z" + } + }, + "outputs": [], + "source": [ + "indmom = (returns\n", + " .stack()\n", + " .to_frame('ret')\n", + " .join(metadata[['sector']])\n", + " .reset_index()\n", + " .merge(indmom)\n", + " .set_index(['date', 'ticker'])\n", + " .loc[:, ['indmom']])" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:06.164686Z", + "start_time": "2021-02-24T15:31:06.154163Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "MultiIndex: 3551199 entries, (Timestamp('1991-01-04 00:00:00'), 'AA') to (Timestamp('2020-01-03 00:00:00'), 'ZTR')\n", + "Data columns (total 1 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 indmom 3551199 non-null float64\n", + "dtypes: float64(1)\n", + "memory usage: 40.8+ MB\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + ":1: FutureWarning: null_counts is deprecated. Use show_counts instead\n", + " indmom.info(null_counts=True)\n" + ] + } + ], + "source": [ + "indmom.info(null_counts=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:06.193427Z", + "start_time": "2021-02-24T15:31:06.165684Z" + } + }, + "outputs": [], + "source": [ + "indmom.to_hdf(results_path / 'autoencoder.h5', 'factor/indmom')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Recent Max Return" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Max daily returns from calendar month t-1" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:07.585474Z", + "start_time": "2021-02-24T15:31:06.194373Z" + } + }, + "outputs": [], + "source": [ + "maxret = (close\n", + " .pct_change(periods=MONTH)\n", + " .rolling(21)\n", + " .max()\n", + " .resample('W-FRI')\n", + " .last()\n", + " .stack()\n", + " .to_frame('maxret'))" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:07.633250Z", + "start_time": "2021-02-24T15:31:07.586352Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "MultiIndex: 3562402 entries, (Timestamp('1990-03-02 00:00:00', freq='W-FRI'), 'AA') to (Timestamp('2020-01-03 00:00:00', freq='W-FRI'), 'ZYXI')\n", + "Data columns (total 1 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 maxret 3562402 non-null float64\n", + "dtypes: float64(1)\n", + "memory usage: 41.0+ MB\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + ":1: FutureWarning: null_counts is deprecated. Use show_counts instead\n", + " maxret.info(null_counts=True)\n" + ] + } + ], + "source": [ + "maxret.info(null_counts=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:07.662862Z", + "start_time": "2021-02-24T15:31:07.634151Z" + } + }, + "outputs": [], + "source": [ + "maxret.to_hdf(results_path / 'autoencoder.h5', 'factor/maxret')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Long-Term Reversal" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Cumulative returns months t-36 to t-13." + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:08.227020Z", + "start_time": "2021-02-24T15:31:07.663799Z" + } + }, + "outputs": [], + "source": [ + "mom36m = (close\n", + " .pct_change(periods=24*MONTH)\n", + " .shift(12*MONTH)\n", + " .resample('W-FRI')\n", + " .last()\n", + " .stack()\n", + " .to_frame('mom36m'))" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:08.266863Z", + "start_time": "2021-02-24T15:31:08.227869Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "MultiIndex: 2967391 entries, (Timestamp('1993-01-01 00:00:00', freq='W-FRI'), 'AA') to (Timestamp('2020-01-03 00:00:00', freq='W-FRI'), 'ZYXI')\n", + "Data columns (total 1 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 mom36m 2967391 non-null float64\n", + "dtypes: float64(1)\n", + "memory usage: 34.2+ MB\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + ":1: FutureWarning: null_counts is deprecated. Use show_counts instead\n", + " mom36m.info(null_counts=True)\n" + ] + } + ], + "source": [ + "mom36m.info(null_counts=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:08.293411Z", + "start_time": "2021-02-24T15:31:08.267936Z" + } + }, + "outputs": [], + "source": [ + "mom36m.to_hdf(results_path / 'autoencoder.h5', 'factor/mom36m')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Liquidity Metrics" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Turnover" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Avg. monthly trading volume for most recent three months scaled by number of shares; we are using the most recent no of shares from yahoo finance" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:09.804423Z", + "start_time": "2021-02-24T15:31:08.294902Z" + } + }, + "outputs": [], + "source": [ + "turn = (volume\n", + " .rolling(3*MONTH)\n", + " .mean()\n", + " .resample('W-FRI')\n", + " .last()\n", + " .div(metadata.sharesoutstanding)\n", + " .stack('ticker')\n", + " .to_frame('turn'))" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:09.852968Z", + "start_time": "2021-02-24T15:31:09.805188Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "MultiIndex: 3506569 entries, (Timestamp('1990-03-30 00:00:00', freq='W-FRI'), 'AA') to (Timestamp('2020-01-03 00:00:00', freq='W-FRI'), 'ZYXI')\n", + "Data columns (total 1 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 turn 3506569 non-null float64\n", + "dtypes: float64(1)\n", + "memory usage: 40.3+ MB\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + ":1: FutureWarning: null_counts is deprecated. Use show_counts instead\n", + " turn.info(null_counts=True)\n" + ] + } + ], + "source": [ + "turn.info(null_counts=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:09.882475Z", + "start_time": "2021-02-24T15:31:09.854048Z" + } + }, + "outputs": [], + "source": [ + "turn.to_hdf(results_path / 'autoencoder.h5', 'factor/turn')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Turnover Volatility" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Monthly std dev of daily share turnover" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:18.197599Z", + "start_time": "2021-02-24T15:31:09.883665Z" + } + }, + "outputs": [], + "source": [ + "turn_std = (prices\n", + " .volume\n", + " .unstack('ticker')\n", + " .div(metadata.sharesoutstanding)\n", + " .rolling(MONTH)\n", + " .std()\n", + " .resample('W-FRI')\n", + " .last()\n", + " .stack('ticker')\n", + " .to_frame('turn_std'))" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:18.226370Z", + "start_time": "2021-02-24T15:31:18.198490Z" + } + }, + "outputs": [], + "source": [ + "turn_std.to_hdf(results_path / 'autoencoder.h5', 'factor/turn_std')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Log Market Equity" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Natural log of market cap at end of month t-1" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:19.713130Z", + "start_time": "2021-02-24T15:31:18.227239Z" + } + }, + "outputs": [], + "source": [ + "last_price = close.ffill()\n", + "factor = close.div(last_price.iloc[-1])\n", + "mvel = np.log1p(factor.mul(metadata.marketcap).resample('W-FRI').last()).stack().to_frame('mvel')" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:19.767170Z", + "start_time": "2021-02-24T15:31:19.713957Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "MultiIndex: 3597636 entries, (Timestamp('1990-01-05 00:00:00', freq='W-FRI'), 'AA') to (Timestamp('2020-01-03 00:00:00', freq='W-FRI'), 'ZYXI')\n", + "Data columns (total 1 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 mvel 3597636 non-null float64\n", + "dtypes: float64(1)\n", + "memory usage: 41.4+ MB\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + ":1: FutureWarning: null_counts is deprecated. Use show_counts instead\n", + " mvel.info(null_counts=True)\n" + ] + } + ], + "source": [ + "mvel.info(null_counts=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:19.807236Z", + "start_time": "2021-02-24T15:31:19.768465Z" + } + }, + "outputs": [], + "source": [ + "mvel.to_hdf(results_path / 'autoencoder.h5', 'factor/mvel')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Dollar Volume" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Natural log of trading volume time price per share from month t-2" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:19.875033Z", + "start_time": "2021-02-24T15:31:19.808087Z" + } + }, + "outputs": [], + "source": [ + "dv = close.mul(volume)" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:21.166359Z", + "start_time": "2021-02-24T15:31:19.875995Z" + } + }, + "outputs": [], + "source": [ + "dolvol = (np.log1p(dv.rolling(21)\n", + " .mean()\n", + " .shift(21)\n", + " .resample('W-FRI')\n", + " .last())\n", + " .stack()\n", + " .to_frame('dolvol'))" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:21.193943Z", + "start_time": "2021-02-24T15:31:21.167174Z" + } + }, + "outputs": [], + "source": [ + "dolvol.to_hdf(results_path / 'autoencoder.h5', 'factor/dolvol')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Amihud Illiquidity" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Average of daily (absolute return / dollar volume)" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:22.668882Z", + "start_time": "2021-02-24T15:31:21.194934Z" + } + }, + "outputs": [], + "source": [ + "ill = (close.pct_change().abs()\n", + " .div(dv)\n", + " .rolling(21)\n", + " .mean()\n", + " .resample('W-FRI').last()\n", + " .stack()\n", + " .to_frame('ill'))" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:22.719439Z", + "start_time": "2021-02-24T15:31:22.669807Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "MultiIndex: 3210773 entries, (Timestamp('1990-02-02 00:00:00', freq='W-FRI'), 'AA') to (Timestamp('2020-01-03 00:00:00', freq='W-FRI'), 'ZYXI')\n", + "Data columns (total 1 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 ill 3210773 non-null float64\n", + "dtypes: float64(1)\n", + "memory usage: 36.9+ MB\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + ":1: FutureWarning: null_counts is deprecated. Use show_counts instead\n", + " ill.info(null_counts=True)\n" + ] + } + ], + "source": [ + "ill.info(null_counts=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:22.748882Z", + "start_time": "2021-02-24T15:31:22.720722Z" + } + }, + "outputs": [], + "source": [ + "ill.to_hdf(results_path / 'autoencoder.h5', 'factor/ill')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Risk Measures" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Return Volatility" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Standard dev of daily returns from month t-1." + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:24.135283Z", + "start_time": "2021-02-24T15:31:22.749843Z" + } + }, + "outputs": [], + "source": [ + "retvol = (close.pct_change()\n", + " .rolling(21)\n", + " .std()\n", + " .resample('W-FRI')\n", + " .last()\n", + " .stack()\n", + " .to_frame('retvol'))" + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:24.185387Z", + "start_time": "2021-02-24T15:31:24.136130Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "MultiIndex: 3580621 entries, (Timestamp('1990-02-02 00:00:00', freq='W-FRI'), 'AA') to (Timestamp('2020-01-03 00:00:00', freq='W-FRI'), 'ZYXI')\n", + "Data columns (total 1 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 retvol 3580621 non-null float64\n", + "dtypes: float64(1)\n", + "memory usage: 41.2+ MB\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + ":1: FutureWarning: null_counts is deprecated. Use show_counts instead\n", + " retvol.info(null_counts=True)\n" + ] + } + ], + "source": [ + "retvol.info(null_counts=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:24.217951Z", + "start_time": "2021-02-24T15:31:24.186306Z" + } + }, + "outputs": [], + "source": [ + "retvol.to_hdf(results_path / 'autoencoder.h5', 'factor/retvol')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Market Beta" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Estimated market beta from weekly returns and equal weighted market returns for 3 years ending month t-1 with at least 52 weeks of returns." + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:24.501234Z", + "start_time": "2021-02-24T15:31:24.218804Z" + } + }, + "outputs": [], + "source": [ + "index = close.resample('W-FRI').last().pct_change().mean(1).to_frame('x')" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:31:24.504378Z", + "start_time": "2021-02-24T15:31:24.502062Z" + } + }, + "outputs": [], + "source": [ + "def get_market_beta(y, x=index):\n", + " df = x.join(y.to_frame('y')).dropna()\n", + " model = RollingOLS(endog=df.y, \n", + " exog=sm.add_constant(df[['x']]),\n", + " window=3*52)\n", + "\n", + " return model.fit(params_only=True).params['x']" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:32:27.608190Z", + "start_time": "2021-02-24T15:31:24.505305Z" + } + }, + "outputs": [], + "source": [ + "beta = (returns.dropna(thresh=3*52, axis=1)\n", + " .apply(get_market_beta).stack().to_frame('beta'))" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:32:27.652151Z", + "start_time": "2021-02-24T15:32:27.609024Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "MultiIndex: 2969406 entries, (Timestamp('1993-01-01 00:00:00', freq='W-FRI'), 'AA') to (Timestamp('2020-01-03 00:00:00', freq='W-FRI'), 'ZYXI')\n", + "Data columns (total 1 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 beta 2969406 non-null float64\n", + "dtypes: float64(1)\n", + "memory usage: 34.2+ MB\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + ":1: FutureWarning: null_counts is deprecated. Use show_counts instead\n", + " beta.info(null_counts=True)\n" + ] + } + ], + "source": [ + "beta.info(null_counts=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:32:27.677282Z", + "start_time": "2021-02-24T15:32:27.652927Z" + } + }, + "outputs": [], + "source": [ + "beta.to_hdf(results_path / 'autoencoder.h5', 'factor/beta')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Beta Squared" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Market beta squared" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:32:27.710494Z", + "start_time": "2021-02-24T15:32:27.678160Z" + } + }, + "outputs": [], + "source": [ + "betasq = beta.beta.pow(2).to_frame('betasq')" + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:32:27.726198Z", + "start_time": "2021-02-24T15:32:27.712119Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "MultiIndex: 2969406 entries, (Timestamp('1993-01-01 00:00:00', freq='W-FRI'), 'AA') to (Timestamp('2020-01-03 00:00:00', freq='W-FRI'), 'ZYXI')\n", + "Data columns (total 1 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 betasq 2969406 non-null float64\n", + "dtypes: float64(1)\n", + "memory usage: 34.2+ MB\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + ":1: FutureWarning: null_counts is deprecated. Use show_counts instead\n", + " betasq.info(null_counts=True)\n" + ] + } + ], + "source": [ + "betasq.info(null_counts=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 66, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:32:27.757491Z", + "start_time": "2021-02-24T15:32:27.727756Z" + } + }, + "outputs": [], + "source": [ + "betasq.to_hdf(results_path / 'autoencoder.h5', 'factor/betasq')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Idiosyncratic return volatility" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Standard dev of a regression of residuals of weekly returns on the returns of an equal weighted market index returns for the prior three years." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This takes a while!" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T15:32:27.762385Z", + "start_time": "2021-02-24T15:32:27.760235Z" + } + }, + "outputs": [], + "source": [ + "def get_ols_residuals(y, x=index):\n", + " df = x.join(y.to_frame('y')).dropna()\n", + " model = sm.OLS(endog=df.y, exog=sm.add_constant(df[['x']]))\n", + " result = model.fit()\n", + " return result.resid.std()" + ] + }, + { + "cell_type": "code", + "execution_count": 68, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T18:27:18.180440Z", + "start_time": "2021-02-24T15:32:27.763774Z" + } + }, + "outputs": [], + "source": [ + "idiovol = (returns.apply(lambda x: x.rolling(3 * 52)\n", + " .apply(get_ols_residuals)))" + ] + }, + { + "cell_type": "code", + "execution_count": 69, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T18:27:18.310136Z", + "start_time": "2021-02-24T18:27:18.181902Z" + } + }, + "outputs": [], + "source": [ + "idiovol = idiovol.stack().to_frame('idiovol')" + ] + }, + { + "cell_type": "code", + "execution_count": 70, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T18:27:18.360087Z", + "start_time": "2021-02-24T18:27:18.311025Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "MultiIndex: 2969406 entries, (Timestamp('1993-01-01 00:00:00', freq='W-FRI'), 'AA') to (Timestamp('2020-01-03 00:00:00', freq='W-FRI'), 'ZYXI')\n", + "Data columns (total 1 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 idiovol 2969406 non-null float64\n", + "dtypes: float64(1)\n", + "memory usage: 34.2+ MB\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + ":1: FutureWarning: null_counts is deprecated. Use show_counts instead\n", + " idiovol.info(null_counts=True)\n" + ] + } + ], + "source": [ + "idiovol.info(null_counts=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 71, + "metadata": { + "ExecuteTime": { + "end_time": "2021-02-24T18:27:18.394953Z", + "start_time": "2021-02-24T18:27:18.360999Z" + } + }, + "outputs": [], + "source": [ + "idiovol.to_hdf(results_path / 'autoencoder.h5', 'factor/idiovol')" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.8" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": true, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": true + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/installation/linux/ml4t-backtest.yml b/installation/linux/ml4t-backtest.yml new file mode 100644 index 000000000..81092575e --- /dev/null +++ b/installation/linux/ml4t-backtest.yml @@ -0,0 +1,204 @@ +name: backtest +channels: + - defaults + - conda-forge +dependencies: + - _libgcc_mutex=0.1=conda_forge + - _openmp_mutex=4.5=1_gnu + - argon2-cffi=20.1.0=py36h27cfd23_1 + - attrs=20.3.0=pyhd3eb1b0_0 + - binutils_impl_linux-64=2.35.1=h193b22a_2 + - binutils_linux-64=2.35=h67ddf6f_30 + - bleach=3.3.0=pyhd3eb1b0_0 + - bottleneck=1.3.2=py36heb32a55_1 + - ca-certificates=2021.1.19=h06a4308_0 + - certifi=2020.12.5=py36h06a4308_0 + - cffi=1.14.5=py36h261ae71_0 + - cycler=0.10.0=py36_0 + - dbus=1.13.18=hb2f20db_0 + - decorator=4.4.2=pyhd3eb1b0_0 + - defusedxml=0.6.0=pyhd3eb1b0_0 + - entrypoints=0.3=py36_0 + - expat=2.2.10=he6710b0_2 + - fontconfig=2.13.1=hba837de_1004 + - freetype=2.10.4=h5ab3b9f_0 + - gcc_impl_linux-64=9.3.0=h70c0ae5_18 + - gcc_linux-64=9.3.0=hf25ea35_30 + - gettext=0.19.8.1=h9b4dc7a_1 + - glib=2.66.7=h9c3ff4c_0 + - glib-tools=2.66.7=h9c3ff4c_0 + - gst-plugins-base=1.18.3=h04508c2_0 + - gstreamer=1.18.3=h3560a44_0 + - gxx_impl_linux-64=9.3.0=hd87eabc_18 + - gxx_linux-64=9.3.0=h3fbe746_30 + - icu=68.1=h2531618_0 + - importlib-metadata=3.7.0=py36h5fab9bb_0 + - importlib_metadata=3.7.0=hd8ed1ab_0 + - ipykernel=5.5.0=py36he448a4c_1 + - ipython=5.8.0=py36_1 + - ipython_genutils=0.2.0=pyhd3eb1b0_1 + - ipywidgets=7.6.3=pyhd3eb1b0_1 + - jinja2=2.11.3=pyhd3eb1b0_0 + - jpeg=9d=h36c2ea0_0 + - jsonschema=3.2.0=py_2 + - jupyter=1.0.0=py36_7 + - jupyter_client=6.1.11=pyhd8ed1ab_1 + - jupyter_console=5.2.0=py36_1 + - jupyter_contrib_core=0.3.3=py_2 + - jupyter_core=4.7.1=py36h06a4308_0 + - jupyter_highlight_selected_word=0.2.0=py36h5fab9bb_1002 + - jupyter_latex_envs=1.4.6=py36h9f0ad1d_1001 + - jupyter_nbextensions_configurator=0.4.1=py36h5fab9bb_2 + - jupyterlab_widgets=1.0.0=pyhd3eb1b0_1 + - kernel-headers_linux-64=2.6.32=h77966d4_13 + - kiwisolver=1.3.1=py36h2531618_0 + - krb5=1.17.2=h926e7f8_0 + - lcms2=2.12=hddcbb42_0 + - ld_impl_linux-64=2.35.1=hea4e1c9_2 + - libblas=3.9.0=8_openblas + - libcblas=3.9.0=8_openblas + - libclang=11.0.1=default_ha53f305_1 + - libedit=3.1.20191231=h14c3975_1 + - libevent=2.1.10=hcdb4288_3 + - libffi=3.3=he6710b0_2 + - libgcc-devel_linux-64=9.3.0=h7864c58_18 + - libgcc-ng=9.3.0=h2828fa1_18 + - libgfortran-ng=9.3.0=hff62375_18 + - libgfortran5=9.3.0=hff62375_18 + - libglib=2.66.7=h1f3bc88_0 + - libgomp=9.3.0=h2828fa1_18 + - libiconv=1.16=h516909a_0 + - liblapack=3.9.0=8_openblas + - libllvm11=11.0.1=hf817b99_0 + - libopenblas=0.3.12=pthreads_h4812303_1 + - libpng=1.6.37=hbc83047_0 + - libpq=13.1=hfd2b0eb_1 + - libsodium=1.0.18=h7b6447c_0 + - libstdcxx-devel_linux-64=9.3.0=hb016644_18 + - libstdcxx-ng=9.3.0=h6de172a_18 + - libtiff=4.2.0=hdc55705_0 + - libuuid=2.32.1=h7f98852_1000 + - libwebp-base=1.2.0=h27cfd23_0 + - libxcb=1.14=h7b6447c_0 + - libxkbcommon=1.0.3=he3ba5ed_0 + - libxml2=2.9.10=h72842e0_3 + - libxslt=1.1.33=h15afd5d_2 + - lxml=4.6.2=py36h04a5ba7_1 + - lz4-c=1.9.3=h2531618_0 + - markupsafe=1.1.1=py36h7b6447c_0 + - matplotlib=3.3.4=py36h06a4308_0 + - matplotlib-base=3.3.4=py36h62a2d02_0 + - mistune=0.8.4=py36h7b6447c_0 + - mysql-common=8.0.23=ha770c72_1 + - mysql-libs=8.0.23=h935591d_1 + - nbformat=5.1.2=pyhd3eb1b0_1 + - ncurses=6.2=he6710b0_1 + - nbconvert=5.6.1=py36h9f0ad1d_1 # avoid warnings on notebook lauch + - nb_conda_kernels + - notebook + - nspr=4.29=h9c3ff4c_1 + - nss=3.62=hb5efdd6_0 + - numpy=1.18.1=py36h95a1406_0 + - olefile=0.46=py36_0 + - openssl=1.1.1j=h27cfd23_0 + - packaging=20.9=pyhd3eb1b0_0 + - pandoc=2.11.4=h7f98852_0 + - pandocfilters=1.4.3=py36h06a4308_1 + - pcre=8.44=he6710b0_0 + - pexpect=4.8.0=pyhd3eb1b0_3 + - pickleshare=0.7.5=pyhd3eb1b0_1003 + - pillow=8.1.0=py36he98fc37_0 + - pip=19.2.3 + - prometheus_client=0.9.0=pyhd3eb1b0_0 + - prompt_toolkit=1.0.15=py_1 + - ptyprocess=0.7.0=pyhd3eb1b0_2 + - pycparser=2.20=py_2 + - pygments=2.8.0=pyhd3eb1b0_0 + - pyparsing=2.4.7=pyhd3eb1b0_0 + - pyqt=5.12.3=py36h5fab9bb_7 + - pyqt-impl=5.12.3=py36h7ec31b9_7 + - pyqt5-sip=4.19.18=py36hc4f0c31_7 + - pyqtchart=5.12=py36h7ec31b9_7 + - pyqtwebengine=5.12.1=py36h7ec31b9_7 + - pyrsistent=0.17.3=py36h7b6447c_0 + - python=3.6.13=hffdb5ce_0_cpython + - python-dateutil=2.8.1=pyhd3eb1b0_0 + - python_abi=3.6=1_cp36m + - pytz=2021.1=pyhd3eb1b0_0 + - pyyaml=5.4.1=py36h27cfd23_1 + - pyzmq=22.0.3=py36h81c33ee_0 + - qt=5.12.9=hda022c4_4 + - qtconsole=5.0.2=pyhd3eb1b0_0 + - qtpy=1.9.0=py_0 + - readline=8.1=h27cfd23_0 + - send2trash=1.5.0=pyhd3eb1b0_1 + - setuptools=52.0.0=py36h06a4308_0 + - simplegeneric=0.8.1=py36_2 + - six=1.15.0=py36h06a4308_0 + - sqlite=3.34.0=h74cdb3f_0 + - sysroot_linux-64=2.12=h77966d4_13 + - terminado=0.9.2=py36h06a4308_0 + - testpath=0.4.4=pyhd3eb1b0_0 + - tk=8.6.10=hbc83047_0 + - tornado=6.1=py36h27cfd23_0 + - traitlets=4.3.3=py36_0 + - typing_extensions=3.7.4.3=pyha847dfd_0 + - wcwidth=0.2.5=py_0 + - webencodings=0.5.1=py36_1 + - wheel=0.36.2=pyhd3eb1b0_0 + - widgetsnbextension=3.5.1=py36_0 + - xz=5.2.5=h7b6447c_0 + - yaml=0.2.5=h7b6447c_0 + - zeromq=4.3.4=h9c3ff4c_0 + - zipp=3.4.0=pyhd3eb1b0_0 + - zlib=1.2.11=h7b6447c_3 + - zstd=1.4.8=ha95c52a_1 + - pip: + - alembic==1.5.5 + - alphalens==0.4.0 + - autopep8==1.5.5 + - bcolz==1.2.1 + - cached-property==1.5.2 + - chardet==4.0.0 + - click==7.1.2 + - cvxpy==1.1.10 + - ecos==2.0.7.post1 + - empyrical==0.5.5 + - h5py==3.1.0 + - idna==2.10 + - intervaltree==3.1.0 + - iso3166==1.0.1 + - iso4217==1.6.20180829 + - joblib==1.0.1 + - logbook==1.5.3 + - lru-dict==1.1.7 + - mako==1.1.4 + - multipledispatch==0.6.0 + - networkx==1.11 + - numexpr==2.7.2 + - osqp==0.6.2.post0 + - pandas==0.22.0 + - pandas-datareader==0.8.1 + - patsy==0.5.1 + - pip==19.2.3 + - pycodestyle==2.6.0 + - pyfolio==0.9.2 + - pyportfolioopt==1.4.1 + - python-editor==1.0.4 + - python-interface==1.6.0 + - qdldl==0.1.5.post0 + - requests==2.25.1 + - scikit-learn==0.24.1 + - scipy==1.5.4 + - scs==2.1.2 + - seaborn==0.10.1 + - sortedcontainers==2.3.0 + - sqlalchemy==1.3.23 + - statsmodels==0.12.2 + - tables==3.6.1 + - threadpoolctl==2.1.0 + - toml==0.10.2 + - toolz==0.11.1 + - trading-calendars==2.1.1 + - urllib3==1.26.3 + - git+https://github.com/stefan-jansen/zipline.git@b33e5c955a58d888f55101874f45cd141c61d3e1#egg=zipline \ No newline at end of file