A pairs trade or pair trading is a market neutral trading strategy enabling traders to profit from virtually any market conditions: uptrend, downtrend, or sideways movement. The strategy monitors performance of two historically correlated securities. When the correlation between the two securities temporarily weakens, i.e. one stock moves up while the other moves down, the pairs trade would be to short the outperforming stock and to long the underperforming one, betting that the "spread" between the two would eventually converge. The divergence within a pair can be caused by temporary supply/demand changes, large buy/sell orders for one security, reaction for important news about one of the companies, and so on.
Choose the best pair, calculate the residual, then forecast the next day's residual to generate buy and sell signals.
- Mean Square Error
Step 1: Ticker list of Indian Stock Market was collected from: https://docs.google.com/spreadsheets/d/1ymMHn4gE9Gjjtvw3bGVlD_9CbdfDm01_cLBRd5nRN_M/edit?usp=sharing
Step 2: Delisted Tickers were Deleted, as were Tickers whose data was only partially available at Yahoo.
Step 3:
Remaining Tickers Data was collected from Yahoo using yfinance
library from 2008 to 2021
def download_data(symbol, source, start_date, end_date):
start = datetime.strptime(start_date, '%d-%m-%Y')
end = datetime.strptime(end_date, '%d-%m-%Y')
df = pdr.get_data_yahoo(symbol, data_source=source, start=start, end=end)
return df
Step 4: Only Closing Price Data was kept for each ticker for prediction
Step 1: Use PCA to reduce the dimensions of Data of each ticker
Step 2: Use DBScan to Cluster stocks similar to each other
Step 3: Use T-Sne to visualize if Clustering done makes any sense or not
Step 4: Because tickers within the same cluster would be related to each other, the cointegration test will be used to determine which are the best pairs among them.
Step 1: Get the residuals on each ticker that is part of the same cluster.
Step2: Perform the test
Step 3: Get the top Best pairs in each cluster for Pair Trading
Features Developed:
- Leading Indicators: They inform about the future trend in time series
- Lagging Indicators: They are used to confirm the leading indicators
-
RSI - Relative Strength Index: Calculation Example
-
It is leading momentum indicator which helps in identifying trend reversal of time series
-
It oscillates b/w 0 and 100.
-
Formula
- RSI = 100 - 100/(1 + RS)
- RS = Average points Gain over a fixed period by stock(at least 14 days) / Average points loss over a fixed period by stock(at least 14 days)
-
When 0<=RSI <=20, Stock is supposed to be oversold and ready for a upward correction(buying will start)
-
When 80<=RSI<=100, Stock is supposed to be overbought and ready for downward correction(selling will start)
-
-
Stochastic RSI: Calculation Example Reference
-
It is a leading momentum indicator based on RSI
-
In Practice RSI is a slow moving indicator, to fix the slowness Stochastic RSI moves rapidly from overbought to oversold.
-
Formula
- t' StochRSI = (t'-period RSI - Lowest Low RSI in t' period) / (Highest High RSI in t'period - Lowest Low RSI in t' period)
-
0 <= Stochastic-RSI <= 1
-
A StochRSI reading above 0.8 is considered overbought, while a reading below 0.2 is considered oversold
-
Overbought doesn't necessarily mean the price will reverse lower, just like oversold doesn't mean the price will reverse higher. Rather the overbought and oversold conditions simply alert traders that the RSI is near the extremes of its recent readings.
-
When the StochRSI is above 0.50, the security may be seen as trending higher and vice versa when it's below 0.50.
-
-
Money Flow Index: Calculation Example
-
It is leading volume based indicator and it does the same job as RSI
-
While RSI consider the price, MFI considers both price and Volume
-
It is also called weighted Volume RSI
-
It oscillates b/w 0 and 100
-
Formula
- MFI = 100 - 100/(1+ MFR)
- MFR(Money flow Ratio) = t-Period Positive Money flow / t-Period Negative Money Flow, where t = 14 typically
- Money Flow = Typical Price * Volume traded on that day
- Typical Price = (High + Low + Close) / 3
-
When 0<=MFI<=20, Stock is supposed to be oversold and ready for a upward correction(buying will start)
-
When 80<=MFI<=100, Stock is supposed to be overbought and ready for downward correction(selling will start)
-
-
Accumulation/Distribution Index/Indicator Calculation Example
- It is a cumulative indicator and uses Volume and Price both to identify whether a Stock is being Accumulated(bought) or Distributed(Sold)
- Or it can be said it identifies the cumulative flow of money into and out of stock and helps in identifying trend reversal
- If ADL line is going downward but stock is going up, then it is a indicator of downward correction,as it is possible someone sold lots of volume of stocks and vice - versa
- Formula
- Money Flow Multiplier = [(Close - Low) - (High - Close)] /(High - Low)
- 0<=MFM<=1 is +ve when Close = Upper half of candle stick chart ( When it is in upper half it is an indicator of buying pressure > selling pressure)
- 0<=MFM<=1 is -ve when Close = Lower half of candle stick chart( When it is in lower half it is an indicator of buying pressure < selling pressure)
- Money Flow Volume = Money Flow Multiplier x Volume for the Period
- ADL = Previous ADL + Current Period's Money Flow Volume
- Whether ADL indicator goes upwards or downwards depends on sign of MF
-
Average True Range Calculation Example , Reference
- The average true range is a price volatility indicator showing the average price variation of assets within a given time period.
- It is generally calculated on 14 day periods of True Range
- Formula
- ATR = (13 x Previous Day ATR + Current TR ) / 14
- TR = MAX( ( Today_High - Today_low) , ABS(Today_High - Previous_Close) , ABS( Today_Low - Previous_Close) )
- True range takes the gap b/w price from current from previous day to identify the movement of price
- ATR only measures the volatility, it does not measure direction of price movement
- Higher the ATR value, higher is the volatility and vice versa
-
Bollinder Bands Calculation Example
- Bollinger Bands are envelopes plotted at a standard deviation level above and below a simple moving average of the price. Because the distance of the bands is based on standard deviation
- When the bands tighten during a period of low volatility, it raises the likelihood of a sharp price move in either direction.
- When the bands separate by an unusual large amount, volatility increases and any existing trend may be ending.
- Formula:
- Middle Band = 20-day simple moving average (SMA)
- Upper Band = 20-day SMA + (20-day standard deviation of price x 2)
- Lower Band = 20-day SMA - (20-day standard deviation of price x 2) Band Example
-
Exponential Moving Average Calculation Example
-
They are lagging indicator confirming the trend
-
more weightage is given to recent prices while taking average
-
Formula
- Initial EMA: 10-period sum / 10
- Multiplier: (2 / (Time periods + 1) ) = (2 / (10 + 1) ) = 0.1818 (18.18%)
- EMA: {Close - EMA(previous day)} x multiplier + EMA(previous day).
-
-
Moving Average Convergence Divergence Indicator
- As the name suggests, MACD is all about the convergence and divergence of the two moving averages. Convergence occurs when the two moving averages move towards each other, and divergence occurs when the moving averages move away.
- Formula
- MACD = 12 day EMA - 26 day EMA(Exponential Moving average)
- The sign associated with the MACD just indicates the direction of the stock’s move. '+' for upward move and '-' for downward move.
- positive sign is only possible if 12 day EMA> 26 day EMA and if that is the case then as we know EMA depends more on recent values, so price must be trending in upward direction.
- Magnitude of MACD signifies the strength of upward and downward trend. MACD line GRAPH
- When the MACD Line crosses the centerline from the negative territory to positive territory, it means there is a divergence between the two averages. This is a sign of increasing bullish momentum; therefore, one should look at buying opportunities and vice versa
-
Log Return
- Formula
- Log(current_price / previous_price)
-
Residual/Spread
- We will use linear regression to fit both Stocks and calculate the resdual and use it as a feature here.
- As this pair passed through cointegration test, residual should come as stationery
-
Y_Pred
- Price of next day is set as Y_Pred
-
FFT:
- All Feature Engineered Data (except Test Data and Y_pred) was smoothen out using Fast Fourier Transform.
Step 1: Get the Residuals/Spread for best Pairs from cointegration test
Step 2: Model the next day Residual/Spread using ARIMA
-
Arima uses its own lags and the lagged forecast errors to predict future values in time series
-
ARIMA is divided in parts, Auto Regression, Integrated, Moving Average.
-
Auto Regression
-
Integrated
- To remove correlation or to make the time series stationary in lagged time series, difference of current value from previous value is performed, it is to be done multiple times if required
-
Moving Average
- Only Hyperparameters in ARIMA is the number of lag to be used in Auto Regression, Moving Average and number of times the differencing(Integrated) is to be performed
- Number of lags in Auto Regression can be found by studying partial auto correlation plot
- Number of lags in Moving Average and Differences required can be identified using auto correlation plot and ADF test to determine whether it is stationary or not
ARIMA Result:
For prediction in ARIMA, just residual data of pairs was necessary, however in this section, we will use every feature produced in the feature engineering section.
Part 2: Using Elastic Regression
Step 1: Calculate Z-Score using mu=60 day moving average of residual, sigma = 60 day standard deviation of residual and x = current day residual
Step 2: Generate Sell and Buy signals using Z-score calculated in Step-1 for the pairs
Step 3: Create a trading method which triggers sell and buy whenever the signal is observed.
def trade(EQ_1, EQ_2, Residual,test_data_len ,share_multiplier,window=60):
ratios_mavg = Residual.rolling(window=window,
center=False).mean()[-test_data_len-1:][0:test_data_len] ##Taking 60 day mean initially from train data and then add each day of test data to it
std = Residual.rolling(window=window,
center=False).std()[-test_data_len-1:][0:test_data_len] ##Taking 60 day mean initially from train data and then add each day of test data to it
x = Residual[-test_data_len:].reset_index(drop=True)
mu = ratios_mavg60.reset_index(drop=True)
sigma = std_60.reset_index(drop=True)
z_score = (x - mu)/sigma
z_score = z_score.reset_index(drop=True)
EQ_1 = EQ_1.reset_index(drop=True)
EQ_2 = EQ_2.reset_index(drop=True)
money = 0
countEQ1 = 0
countEQ2 = 0
eq1bought = 0
eq1sold = 0
eq2bought = 0
eq2sold = 0
for i in range(test_data_len):
# Sell short
if z_score[i] > 1:
money += share_multiplier*(EQ_2[i]* x[i] - EQ_1[i] )
countEQ2 -= x[i]*share_multiplier
countEQ1 += 1*share_multiplier
# print('Buy EQ1:',money,EQ_1[i],x[i])
# print('Sell EQ2:',money,countEQ2)
# print('\n')
elif z_score[i] < -1:
money+= share_multiplier*(EQ_1[i] - EQ_2[i]* x[i])
countEQ2 += x[i]*share_multiplier
countEQ1 -= 1*share_multiplier
# print('Sell EQ1:',EQ_1[i] * x[i],EQ_1[i],x[i])
# print('Buy EQ2:',EQ_2[i],countEQ2)
# print('Money',money)
# print('\n')
elif abs(z_score[i]) < 0.5:
money += countEQ2 * EQ_2[i] + EQ_1[i] * countEQ1
countEQ1 = 0
countEQ2 = 0
return money
While Random Forest and Elastic Regression produced similar results but both produced superior Results to ARIMA
Profit at Predicted Price of next day using ARIMA: 6337.889865173798
Profit at Predicted Price of next day using LR: 11304.181844212108
Profit at Predicted Price of next day using RF: 12496.581864440399