-
Notifications
You must be signed in to change notification settings - Fork 0
/
class_3.Rmd
308 lines (220 loc) · 8.73 KB
/
class_3.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
---
title: "Basic Time Series Constructions"
subtitle: "Moving Average and Auto Regression"
output: html_document
---
```{r,message=FALSE}
source('library.R')
```
A time series is a a set of data points measured over time. Generally the measurements are over equaly
spaced time intervals such as days, weeks, months, or years. One example could be the increase or
decrease in an investment after purchase. The measurement would start at zero then change each day.
The mathematical notation uses $X_t$ to index an obvervatrion $X$ at time $t$.
t (days) | $X_t$ Change from buying price | Concrete value
----------|--------------------------------|---------------
0 | 0 | 0
1 | $\epsilon_1$ | 1.2
2 | ? | ?
The stock is bought and increases in value by 1.2 on the first day.
Short term gain or loss in the commodities market is thought to be due to random variation
and previous market values.
The random variation is represented by $\epsilon_t$ and is called the error term at time $t$.
It is a random variable drawn from the normal distribution with a mean zero and unknown
variance.
In the ARIMA model, here are three groups of values that go into each day's commoditiy price:
the current time period's error term $\epsilon_t$, all the previous time periods valuations $X_0 \dots X_{t-1}$,
and all the previous time periods error terms $\epsilon_0 \dots \epsilon_{t-1}$. ARIMA is the specification for parameterizing
a function of the later two groups of values to predict new observations or fit current observations.
Future error terms $\epsilon_{t+i}$ cannot be predicted, and interestingly enough, current and
past error terms can't be observed.
$\hat{X_t}$
But we can measure the actual commodity value, which includes that error term, and subtract
out our prediction to get a predition of the error term $\epsilon_t = X_t - \hat{X_t}$
ARIMA time series
At each time period, there is the addition of this random normal realization, but also
the previous random effects
But what is the model for $t=2$? We know that the value it closes at
after the first day has to be the value it starts at the next. But how will that
information be incorporated into a mathematical model? For starters, we can just use a
portion of the $\epsilon_1$ value.
One method of modeling time series data is called ARIMA.
This method combines three ways to introduce parameters into a model used to forcast
future observations or describe observed observations. The proportion of the previous error
can be represented by $\theta$. Here is an example for the case where $\theta=0.4$.
t (days) | $X_t$ Change from buying price | Notation | $\theta_1$ | $\epsilon$ | Concrete value
----------|------------------------------------|-----------|------------|-----------|---------
0 | 0 | $X_0$ | N/A | N/A | 0
1 | $\epsilon_1$ | $X_1$ | N/A | 1.2 | 1.2
2 | $\theta_1 \epsilon_1 + \epsilon_2$ | $X_2$ | 0.4 | -0.5 | `r 0.4*1.2 -0.5`
This model show above is $ X_t = 0.4 \epsilon_{t-1} + \epsilon_t $
Each observation is just a portion of the error in the previous observation plus the new error amount.
The problem, the actual error cannot be measured. Only the resulting loss or gain in the stock
can be measured.
1. **A**uto **R**egressive
2. **I**ntegrated (Differences)
3. **M**oving **A**verage
Let $X_i$ measure the distance a point is from where is starts.
Time Series
: An ordered set of responses equally spaced in time. For example, observations recorded
every day, week, or month.
$Y_i = \text{A function of the previous observations and previous errors} + \text{current error}$
Auto Regressive
: Future observations are modeled as a function of previous observations.
Integrated
: The differences of an observation minus a prevous observations is modeled.
```{r}
install.packages('ggfortify')
install.packages('sweep')
```
```{r}
library(tidyquant)
library(broom)
library(sweep)
library(forecast)
library(magrittr)
library(ggplot2)
library(ggfortify)
library(xts)
library(tidyverse)
```
# Goals for Monday April 2.
1. Simulate ARIMA time series, and estimate the parameters.
2. Experiment with the length of the time series to see how long a time series needs to be to
generate accurate parameter estimates.
3. Experiment with the frequency and use seasonal parameters to estimate.
4. Simulate some series that are not stable, estimate them.
5. Talk about the mathematical representation of a series and the roots of $\phi(L)$
If the roots are outside the unit circle, then the series is stable.
6. Talk about what seasonality is.
7. Load various time series, fit them.
8. Plot times series and their fits.
9. Use tests to stationarity.
Apply the tests illustrated by Hila in http://www.statosphere.com.au/check-time-series-stationary-r/
10. Use `diff` to calculate differences.
## Create a simulated time series as time series object and as an xts object and compare.
* There are three estimates,
1. The estimate of the times series
2. The estimate of the xts object
3. The estimate resulting from auto.arima
* The frequency must be set to 4 to get quarterly data.
* The ats estimate will not be correct unless the attribute of frequency is also set.
* All three estimate will be the same if the length of the time series is long enough.
Exercise: change this to simulate a 12 month, yearly cycle. Experiement with the length of the
series. How short can you make it and still get auto.arima to work correctly?
What happens if the frequency is not specified correctly?
Does the length needed to get auto.arima depend on the magnitude of the seasonal
and not seasonal parameters?
The resolution of the simulated time series can be
* year = "Y"
* month = "m"
* day = 'd'
* hour = "H"
* mins = 'M',
* secs = 'S
Simulate a time series that is observed every day instead of every month using '2006//d'.
```{r}
set.seed(125)
l=10000
dates <-timeBasedSeq('20060204//m',length=l)
parameters <- list( model=list(ar=c(0,0,0,0.8), ma=c(0.5)), n=l)
y <- ts(do.call(arima.sim,parameters),frequency=4)
fit_y <- arima(y, order=c(0,0,1), seasonal=list(order=c(1,0,0)), include.mean=FALSE)
c(coef(fit_y), sigma2 = fit_y$sigma2)
x <-xts(y,order.by = dates,descr="Simulated xts object")
colnames(x) <- c("rate")
attr(x,'frequency')
# Fit without the frequency. Seasonal adjustment will be wrong.
fit_auto <- auto.arima(x)
broom::tidy(fit_auto)
sweep::sw_tidy(fit_auto)
sweep::sw_tidy(x)
attr(x,'frequency') <-4
attr(x,'frequency')
fit_x <- arima(x, order=c(0,0,1), seasonal=list(order=c(1,0,0)),include.mean=FALSE)
c(coef(fit_x), sigma2 = fit_x$sigma2)
fit_auto <- auto.arima(x)
broom::tidy(fit_auto)
install.packages('tsdecomp')
library(tsdecomp)
roots.allocation(fit_auto)
```
Create a non stationary the needs one difference to stablize.
```{r}
set.seed(125)
l=10000
dates <-timeBasedSeq('20060204//m',length=l)
parameters <- list( model=list(order=c(12,1,3),ar=c(0,0,0,0,0,0,0,0,0,0,0,0.4), ma=c(0.5,0,.1)), n=l)
y <- ts(do.call(arima.sim,parameters),frequency=12)
fit_y <- arima(y, order=c(0,1,3), seasonal=list(order=c(1,0,0)), include.mean=FALSE)
c(coef(fit_y), sigma2 = fit_y$sigma2)
sweep::sw_tidy(fit_y)
sweep::sw_glance(fit_y)
x <-xts(y,order.by = dates,descr="Simulated xts object")
colnames(x) <- c("rate")
attr(x,'frequency')
# Fit without the frequency. Seasonal adjustment will be wrong.
fit_auto <- auto.arima(x)
broom::tidy(fit_auto)
sweep::sw_tidy(fit_auto)
sweep::sw_tidy(x)
attr(x,'frequency') <-4
attr(x,'frequency')
fit_x <- arima(x, order=c(0,0,1), seasonal=list(order=c(1,0,0)),include.mean=FALSE)
c(coef(fit_x), sigma2 = fit_x$sigma2)
fit_auto <- auto.arima(x)
broom::tidy(fit_auto)
install.packages('tsdecomp')
library(tsdecomp)
roots.allocation(fit_auto)
```
```{r}
Acf(y)
```
```{r}
Pacf(y)
```
```{r}
Box.test(y,lag=5,type="Ljung-Box")
```
```{r}
tseries::adf.test(y,alternative="stationary")
```
```{r}
tseries::kpss.test(y)
```
```{r}
print(dates)
print(y_0)
print(y)
print(x)
1
```
It turns out sweep is the broom package for forcasts.
http://www.business-science.io/code-tools/2017/07/09/sweep-0-1-0.html
```{r}
tidyquant::tq_get_options()
```
```{r}
from <- today() - years(1)
AAPL <- tidyquant::tq_get("AAPL",get = "stock.prices",from = from)
```
```{r}
plot(AAPL)
AAPL %>%
select(date, close) %>%
tq_mutate( mutate_fun = arima(order=c(1,0,0)))
```
# tq_get
```{r}
oil <- tq_get("DCOILWTICO", get = "economic.data")
names(oil)
dim(oil)
str(oil)
head(oil)
oil %>% na.omit() %>% pacf
gold <- tq_get("GOLDAMGBD228NLBM", get = "economic.data")
plot(gold)
gold %>% na.omit() %>% pacf
```
```{r}
```