-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy patharima_model_gov_expenditures.Rmd
147 lines (114 loc) · 4.86 KB
/
arima_model_gov_expenditures.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
title: "Time Series Forecasting"
author: "K. Lass"
date: "1/12/2022"
output:
pdf_document: default
html_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
#import libraries
library('forecast')
library('stats')
library('tseries')
```
## Data
Series: Federal Government: Current Expenditures
Web Address: (https://fred.stlouisfed.org/series/NA000283Q)
(1) The federal government budget includes three types of spending: mandatory, discretionary, and interest on debts owed by the nation. The majority of the Government's expenditures go towards Social Security, Medicare, and Medicaid programs as mandatory spending.
(a) Units: Millions of Dollars, Not Seasonally Adjusted
(b) Low Frequency: Quarterly
(c) Number Observations: 48
(i) Duration: Q4 2006- Q3 2018
(ii) Within-Sample: Area of focus is Q4 2006- Q3 2016 (40 Obs)
(iii) Post-Sample: This leaves 8 quarters (Q4 2016 - Q3 2018) after the area of focus for post-sample testing
```{r cars}
#read/attach data
data=read.csv("/stfm/dev2/m1kal01/Personal/data.csv")
attach(data)
#generate time series
Expenditure_Quarterly=ts(Expenditure, frequency=4)
print(Expenditure_Quarterly)
#Create size restrictions
N=length(Expenditure_Quarterly)
WithinSampleLength=N-8
WithinSample=ts(Expenditure_Quarterly[1:WithinSampleLength],frequency=4,start=c(2006,4))
print(WithinSample)
PostTest=Expenditure_Quarterly[(WithinSampleLength+1):N]
print(PostTest)
```
## Plot
```{r Expenditure_Quarterly, echo=FALSE}
plot(Expenditure_Quarterly)
```
## Purpose
When modeling a series, you must start by testing for stationarity, this is done by looking at the behavior of the error term. We are looking to see if the errors or series are autocorrelated, and thus the behavior of the variable today I affected by its behavior in the past. If the autocorrelation coefficient, rho, is equal to 1, the error term behaves as Et=Et-1+ut. This is a random walk model, thus the variance become undefined, as it goes to infinity. In terms of the difference equation, the time path and the general solution of this variable will not converge to equilibrium, the model is totally random. This is a unit root, which can detected by a Dickey-Fuller test, and it must be dealt with by differencing the equation. Once the series is stationary, it must be tested to see if the series is truly white noise, which can be tested by a Ljung-Box test. In the case that it is truly white noise, the model can not be used to forecast. If it is not, examining of the autocorrelation functions and partial autocorrelation functions will reveal if the model should be an AR,MA, or ARMA process.
#Test for Stationarity
First, a Dickey-Fuller test for a unit root was ran on the original series.
```{r}
#adf is augmented dickey fuller test
adf.test(WithinSample,alternative='stationary')
```
The results of this test:
A large p-value is indicative of a unit root process, that the series is not stationary. To deal with a not stationary series, we use differencing.
Reject null hypothesis that series is stationary, must difference it.
Differencing the equation:
```{r}
WithinSampleDiff1=diff(WithinSample, differences=1)
plot(WithinSampleDiff1)
```
Dickey-Fuller Test for Stationarity on First-Differenced Series:
```{r}
adf.test(WithinSampleDiff1,alternative='stationary')
```
The results of this test:
Still reject null, difference again.
Differencing the equation again
```{r}
WithinSampleDiff2=diff(WithinSample, differences=2)
plot(WithinSampleDiff2)
```
Dickey-Fuller Test for Stationarity on Second-Differenced Series
```{r}
adf.test(WithinSampleDiff2,alternative='stationary')
```
The results of this test:
Now fail to reject null, there is now not a unit root process and the series is stationary.
#Partial Autocorrelation Function (PACF) and Autocorrelation Function(ACF)
```{r}
Acf(WithinSampleDiff2,main='This is my Autcorrelation Function')
Pacf(WithinSampleDiff2,main='This is my Partial Autcorrelation Function')
```
From visual inspection, it appears to tail off after 3
My Guess: IMA (2,4)
#Doing Auto-ARIMA
Auto-ARIMA is selected by running tons of possible combinations of a model and choosing the one with the smallest AIC (Akaike Information Criteria)
```{r}
auto.arima(WithinSample,seasonal=TRUE)
```
This Auto Arima I disagree with, it doesn't make sense it only differenced it once when it is still not stationary yet.
In response I forced it to allow me to difference twice.
```{r}
ARIMAWithin=auto.arima(WithinSample,d=2,seasonal=TRUE)
print(ARIMAWithin)
```
#Forecasting with Auto-Arima
```{r}
#h means steps ahead
ARIMAWithinYhats=forecast(ARIMAWithin,h=8)
print(ARIMAWithinYhats)
ActualForecastValues=ARIMAWithinYhats[4]
print(PostTest[1:8])
PostValues=PostTest[1:8]
print(PostValues)
```
Getting post-sample residuals:
```{r}
ARIMAResidualsTest=ARIMAWithinYhats$mean-PostTest
print(ARIMAResidualsTest)
```
Getting RMSE
```{r}
sqrt(mean(ARIMAResidualsTest^2))
```