- Project Overview
- Data
- Statistical Risk Model
- Alpha Factors
- The Combined Alpha Factor
- Evaluate Alpha Factors
- Optimal Portfolio Constrained by Risk Model
- Libraries
- References
In this project, I will build a statistical risk model using PCA. I’ll use this model to build a portfolio along with 5 alpha factors. I’ll create these factors, then evaluate them using factor-weighted returns, quantile analysis, sharpe ratio, and turnover analysis. At the end of the project, I’ll optimize the portfolio using the risk model and factors using multiple optimization formulations.
For the dataset, we'll be using the end of day from Quotemedia and sector data from Sharadar.
Udacity doesn't have a license to redistribute the data to us. They are working on alternatives to this problem.
Portfolio risk is calculated using this formula:
where:
- X is the portfolio weights (weights assigned to each stock)
- B is the factor betas (exposure of factors)
- F is the factor covariance matrix (combined with factor betas gives systematic risk)
- S is the idiosyncratic variance matrix (specific risk)
After calculating the profile risk, the following five alpha factors were created:
- Momentum 1 Year Factor [2]
Each factor has a hypothesis that goes with it. For this factor, it is "Higher past 12-month (252 days) returns are proportional to future return". Using that hypothesis, we've generate this code:
from zipline.pipeline.factors import Returns
def momentum_1yr(window_length, universe, sector):
return Returns(window_length=window_length, mask=universe) \
.demean(groupby=sector) \
.rank() \
.zscore()
- Mean Reversion 5 Day Sector Neutral Factor [1]
I have implemented mean_reversion_5day_sector_neutral
using the hypothesis "Short-term outperformers (underperformers) compared to their sector will revert." Using the returns data from universe, demean using the sector data to partition, rank, then converted to a zscore.
def mean_reversion_5day_sector_neutral(window_length, universe, sector):
"""
Generate the mean reversion 5 day sector neutral factor
Parameters
----------
window_length : int
Returns window length
universe : Zipline Filter
Universe of stocks filter
sector : Zipline Classifier
Sector classifier
Returns
-------
factor : Zipline Factor
Mean reversion 5 day sector neutral factor
"""
return -Returns(window_length=window_length, mask = universe)\
.demean(groupby=sector)\
.rank()\
.zscore()
- Mean Reversion 5 Day Sector Neutral Smoothed Factor
Taking the output of the previous factor, we create a smoothed version. mean_reversion_5day_sector_neutral_smoothed
generates a mean reversion 5 day sector neutral smoothed factor. Calling the mean_reversion_5day_sector_neutral function to get the unsmoothed factor, then using SimpleMovingAverage
function to smooth it. We'll have to apply rank and zscore again.
from zipline.pipeline.factors import SimpleMovingAverage
def mean_reversion_5day_sector_neutral_smoothed(window_length, universe, sector):
"""
Generate the mean reversion 5 day sector neutral smoothed factor
Parameters
----------
window_length : int
Returns window length
universe : Zipline Filter
Universe of stocks filter
sector : Zipline Classifier
Sector classifier
Returns
-------
factor : Zipline Factor
Mean reversion 5 day sector neutral smoothed factor
"""
mean_reversion = mean_reversion_5day_sector_neutral(window_length, universe, sector)
return SimpleMovingAverage(inputs=[mean_reversion], window_length = window_length).rank().zscore()
- Overnight Sentiment Factor [1]
For this factor, were using the hypothesis from the paper Overnight Returns and Firm-Specific Investor Sentiment.
from zipline.pipeline.data import USEquityPricing
class CTO(Returns):
"""
Computes the overnight return, per hypothesis from
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2554010
"""
inputs = [USEquityPricing.open, USEquityPricing.close]
def compute(self, today, assets, out, opens, closes):
"""
The opens and closes matrix is 2 rows x N assets, with the most recent at the bottom.
As such, opens[-1] is the most recent open, and closes[0] is the earlier close
"""
out[:] = (opens[-1] - closes[0]) / closes[0]
class TrailingOvernightReturns(Returns):
"""
Sum of trailing 1m O/N returns
"""
window_safe = True
def compute(self, today, asset_ids, out, cto):
out[:] = np.nansum(cto, axis=0)
def overnight_sentiment(cto_window_length, trail_overnight_returns_window_length, universe):
cto_out = CTO(mask=universe, window_length=cto_window_length)
return TrailingOvernightReturns(
inputs=[cto_out],window_length=trail_overnight_returns_window_length
)\
.rank().zscore()
- Overnight Sentiment Smoothed
Just like the implemented factor, we'll also smooth this factor.
def overnight_sentiment_smoothed(cto_window_length, trail_overnight_returns_window_length, universe):
unsmoothed_factor = overnight_sentiment(cto_window_length, trail_overnight_returns_window_length, universe)
return SimpleMovingAverage(
inputs=[unsmoothed_factor], window_length=trail_overnight_returns_window_length
) \
.rank() \
.zscore()
With all the factor implementations done, let's add them to a zipline pipeline.
universe = AverageDollarVolume(window_length=120).top(500)
sector = project_helper.Sector()
pipeline = Pipeline(screen=universe)
pipeline.add(
momentum_1yr(252, universe, sector),
'Momentum_1YR')
pipeline.add(
mean_reversion_5day_sector_neutral(5, universe, sector),
'Mean_Reversion_5Day_Sector_Neutral')
pipeline.add(
mean_reversion_5day_sector_neutral_smoothed(5, universe, sector),
'Mean_Reversion_5Day_Sector_Neutral_Smoothed')
pipeline.add(
overnight_sentiment(2, 5, universe),
'Overnight_Sentiment')
pipeline.add(
overnight_sentiment_smoothed(2, 5, universe),
'Overnight_Sentiment_Smoothed')
all_factors = engine.run_pipeline(pipeline, factor_start_date, universe_end_date)
# all_factors.head()
Note: We're evaluating the alpha factors using delay of 1
Let's view the factor returns over time. It looks like moving up and to the right.
It is not enough to look just at the factor weighted return. A good alpha is also monotonic in quantiles. Let's looks the basis points for the factor returns.
Observations:
- None of these alphas are strictly monotonic; this should lead you to question why this is? Further research and refinement of the alphas needs to be done. What is it about these alphas that leads to the highest ranking stocks in all alphas except MR 5D smoothed to not perform the best.
- The majority of the return is coming from the short side in all these alphas. The negative return in quintile 1 is very large in all alphas. This could also a cause for concern becuase when you short stocks, you need to locate the short; shorts can be expensive or not available at all.
- If you look at the magnitude of the return spread (i.e., Q1 minus Q5), we are working with daily returns in the 0.03%, i.e., 3 basis points, neighborhood before all transaction costs, shorting costs, etc.. Assuming 252 days in a year, that's 7.56% return annualized. Transaction costs may cut this in half. As such, it should be clear that these alphas can only survive in an institutional setting and that leverage will likely need to be applied in order to achieve an attractive return.
Without doing a full and formal backtest, we can analyze how stable the alphas are over time. Stability in this sense means that from period to period, the alpha ranks do not change much. Since trading is costly, we always prefer, all other things being equal, that the ranks do not change significantly per period. We can measure this with the factor rank autocorrelation (FRA).
The last analysis we'll do on the factors will be sharpe ratio. Function sharpe_ratio
calculate the sharpe ratio of factor returns.
def sharpe_ratio(factor_returns, annualization_factor):
"""
Get the sharpe ratio for each factor for the entire period
Parameters
----------
factor_returns : DataFrame
Factor returns for each factor and date
annualization_factor: float
Annualization Factor
Returns
-------
sharpe_ratio : Pandas Series of floats
Sharpe ratio
"""
return annualization_factor * np.mean(factor_returns)/np.std(factor_returns, ddof=1)
Let's see what the sharpe ratio for the factors are. Generally, a Sharpe Ratio of near 1.0 or higher is an acceptable single alpha for this universe.
Observation:
Sharpe Ratio of 1.13 for momentum factor is good but if we look at the auto-correlation plots, FRA for momentum factor looks stable. So smoothing the momentum factor will not have any significant change.
To use these alphas in a portfolio, we need to combine them somehow so we get a single score per stock. This is a area where machine learning can be very helpful. In this module, however, we will take the simplest approach of combination: simply averaging the scores from each alpha.
This is the list of contraints that will optimize against:
Where x is the portfolio weights, B is the factor betas, and r is the portfolio risk
The first constraint is that the predicted risk be less than some maximum limit. The second and third constraints are on the maximum and minimum portfolio factor exposures. The fourth constraint is the "market neutral constraint: the sum of the weights must be zero. The fifth constraint is the leverage constraint: the sum of the absolute value of the weights must be less than or equal to 1.0. The last are some minimum and maximum limits on individual holdings.
Weights generated after applying those constraints:
Yikes. It put most of the weight in a few stocks.
This is the weights distribution after applying regularization to the objective function.
Nice. Well diverfied.
Another common formulation is to take a predefined target weighting(e.g., a quantile portfolio), and solve to get as close to that portfolio while respecting portfolio-level constraints.
This project used Python 3.6.3. The necessary libraries are mentioned in requirements.txt
: