-
Notifications
You must be signed in to change notification settings - Fork 0
/
1.Rmd
707 lines (518 loc) · 42 KB
/
1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
---
output:
pdf_document: default
html_document: default
---
\centering
---
title: \vspace{3in}"Machine Learning Project"
author: "Xiaoyan Lu, Anya Conti, Khushbakht Irfanullah, Xueyan Hou'
date: "12 December 2018"
output:
pdf_document:
number_sections: true
---
\newpage
\tableofcontents
\newpage
#### Introduction
Coordinate descent algorithm is a way of 'solving optimization problems by performing minimization along coordinate directions or coordinate hyperplanes'. It aims to minimize a function over a large set of variables by optimising a single dimension while keeping all the other dimensions fixed. Coordinate desceet has various applications particularly within machine learning because of its easy implementation and scalability. In this report we will be implementing coordinate descent algorithms to solve the Lasso and the Elastic Net problems.
#### Lasso
The Lasso (Least Absolute Shrinkage and Selection Operator) is a powerful method that performs two main tasks: variable selection and regularisation. It does this by applying a shrinking process where it penalizes the coefficients of the regression variables by shrinking some of them to zero. Hence, Lasso yields sparse models by using an l1 norm penalty which forces some of the coefficient estimates to be equal to zero when the tuning parameter, $\lambda$, is large enough. This in turn helps minimise the prediction error of the model. We are now going to use the coordinate descent algorithm to solve the lasso problem, wich requires for us to minimize the following wrt $\beta_j$:
$$\frac{1}{2n}\sum_{i=1}^n(y_i-\sum_{j=1}x_{ij}\beta_j)^2 + \lambda\sum_{j=1}^p|\beta_j|$$
The method we are going to be implementing is as follows:
1- Initialising the coefficients for all the variables to zero, ie,
$\beta_j$ = 0, j = 1,...p
2- Repeating the following steps until convergence:
a. Computing the partial residuals,ie, $r_{ij}$ = $y_i$ - $\sum_{k=j}x_{ik}\beta_k$
b. Computing the least square coefficients of these residuals,ie, $\beta_j^*=\frac{1}{n}\sum_{i=1}^nx_{ij}r_{ij}$
c. Updating the coefficents by soft thresholding,ie, $\beta_j=sign(\beta_j^*)(|\beta_j^*|-\lambda)_+$
We implemented the Lasso using the following method:
```{r}
coordDescAlg.Lasso <- function(lambda, y, x){
x <- as.matrix(x)
converge <- FALSE #initialising coverge as False
n <- dim(x)[1] #number of observations for each variable is equal to the no. of rows
p <- dim(x)[2] #number of variables is equal to the number of columns
BetaHat <- rep(0, p) #initialising BetaHat
BetaHat.star <- rep(0, p) #initialising BetaHat.Star
newBetaHat <- rep(0,p) #initialising newBetaHat
r <- matrix(rep(0,n*p), ncol = p) #creating the residual matrix
while(!converge){
for(j in 1:p){
r[,j] <- y - x[,-j]%*%BetaHat[-j] #computing partial residuals
BetaHat.star[j] <- (t(x[,j])%*% r[,j]) / (t(x[,j])%*% x[,j]) #computing least square coefficients of the residuals
newBetaHat[j] <- sign(BetaHat.star[j])*pmax(abs(BetaHat.star[j]) - n*(lambda / (t(x[,j])%*% x[,j])), 0) #updating newBetaHat by soft thresholding
if(all(round(BetaHat, 5) == round(newBetaHat,5))){ #repeating loop until convergence
converge <- TRUE
break
} else {
BetaHat <- newBetaHat
}
}
}
return(BetaHat)
}
```
STANDARDISATIONX VS UNSTANDARDIZED() DIVISION BY XX)
The Lasso can be advantageous in many situtations because of its ability to shrink and remove redundant coefficients which help provide good accuracy. Moreover, the Lasso helps increase model interpretability as it eliminates irrelevant variables that are not associated with the response variable, thereby, reducing overfitting. It further improves on the Ridge Regression method, by using the l1 norm penalty to actually provide us a subset of variables from the model, that is, carry out successful variable selection. However, despite these advantages, there are certain situtations in which the Lasso does not perform well.
Firstly, when faced with a very complex model, where the number of dimensions is greater than the number of observations, that is, p > n, the Lasso method ends up selecting at most n variables before it starts saturating. So, this might result in a model with quite a few unnecessary variables being used. Secondly, in situations where a group of variables has very high pairwise correlation between them, the Lasso tends to select only only variable from the group, which might result in poor selection as all the other variables in that group are ignored. Finally, even though the Lasso dominates Ridge Regression in the case of variable selection when n > p, but it falls behind in the prediction performance.
#### ELASTIC NET
The Elastic Net can enable us to overcome these limitations of the Lasso, by combining methods from both the Lasso and the Ridge Regression. It does this by adding a quadratic element to the penalty by using both the l1 and l2 norm penalties which removes the limitation on the selected variables number and also stabilizes the selection from grouped variables. Hence, like Lasso it provides sparse solutions but it also has the advantage of performing well with highly correlated variables.
We are now going to use the coordinate descent algorithm to solve the Elastic Net problem which follows a very similar method to solving the Lasso. However, for Elastic net, we need to minimize the following wrt $\beta_j$:
$$\frac{1}{2n}\sum_{i=1}^n(y_i-\sum_{j=1}x_{ij}\beta_j)^2 + \lambda_1\sum_{j=1}^p|\beta_j| + \lambda_2\sum_{j=1}^p\beta_j^2$$
which gives us
$\frac{1}{n}\sum_{i=1}^n(y_i - \sum_{i=1}^px_{ij}\beta_j)x_{ij} + \lambda_1 + 2\lambda_2\beta_j = 0$
$\frac{1}{n}\sum_{i=1}^nr_{ij}x_{ij} + \lambda_1 + 2\lambda_2\beta_j = 0$
$ \beta_j = \frac{1/n\sum_{i=1}^nr_{ij}x_{ij}}{2\lambda_2}$
Updating $\beta_j$ by soft thresholding (J. R. Statist. Soc. B (2005) 67, Part 2, pp. 301???320, Regularization and variable selection via the elastic net) then gives us:
$\beta_j=\frac{sign(\beta_j^*)(|\beta_j^*|-\lambda_1)_+}{1 + 2\lambda_2}$
We then implemented this Elastic Net Algorithm in R with the following code:
```{r}
coordDescAlg.ElNet <- function(lambda, y, x){
#lamda has two dimensions
x <- as.matrix(x)
converge <- FALSE #initialising coverge as False
n <- dim(x)[1] #number of observations for each variable is equal to the no. of rows
p <- dim(x)[2] #total number of variables
BetaHat <- rep(0, p) #initialising BetaHat
BetaHat.star <- rep(0, p) #initialising BetaHat.Star
newBetaHat <- rep(0,p) #initialising newBetaHat
r <- matrix(rep(0,n*p), ncol = p) #creating the residual matrix
while(!converge){
for(j in 1:p){
r[,j] <- y - x[,-j]%*%BetaHat[-j] #computing partial residuals
BetaHat.star[j] <- (t(x[,j])%*% r[,j]) / (t(x[,j])%*% x[,j]) #computing least square coefficients of the residuals
newBetaHat[j] <- (sign(BetaHat.star[j])*pmax(abs(BetaHat.star[j]) - ((n*lambda[1] / (t(x[,j])%*% x[,j]))/(1+2*lambda[2])),0)) #updating newBetaHat by soft thresholding
if(all(round(BetaHat, 5) == round(newBetaHat,5))){ #repeating loop until convergence
converge <- TRUE
break
} else {
BetaHat <- newBetaHat
}
}
}
return(BetaHat)
}
```
RESULTSSS OF ELASTIC NET
#### Selecting the Regularisation parameters for Lasso and Elastic Net
To implement the Lasso and the Elastic Net as efficient feature selection approaches requires a method to select the value for the tuning parameter ($\lambda$ for Lasso and $\lambda_1$ and $\lambda_2$ for the Elastic Net problem). The way we have done this is by using the validation set approach for both the Lasso and the Elastic Net problem. Under this approach, we divided the dataset into a training, validation, and testing set and then used the training set to fit the model. We do not do a random selection because the dataset is already random. We then fit the estimated model to the validation set for different values of $\lambda$, and use this to calculate the MSE. Then we find the value of $\lambda$ to minimize the MSE using our validation function below. For the Lasso, we used the optim() function to find the $\lambda$ with the minimum MSE by using the Brent method which focuses on minimising one-dimensional problems. Whereas, for the Elastic Net we use the L-BFGS-B method for optimisation as it is two-dimensional. This function thereby allows us to find the optimal values for our tuning paramaters for both the Lasso($\lambda$) and the Elastic Net($\lambda_1,\lambda_2$) problems and returns the values of the BetaHat and the MSE.
```{r}
#Called by the Validation function. It calculates the value of the MSE for a given
MSEValid.Lasso <- function(lambda, x, y, sizes){
testSize <- sizes[1] #size of the testing dataset
validSize <- sizes[2] #size of the validation dataset
ytrain = y[1:testSize] #selecting the training data
xtrain = x[1:testSize,]
yvalid = y[(testSize+1):(testSize + validSize)] #selecting the validation set
xvalid = x[(testSize+1):(testSize + validSize),]
BetaHat <- coordDescAlg.Lasso(lambda, ytrain, xtrain) #predicting the value of lambda
MSE <- mean((yvalid - xvalid %*% BetaHat)^2) #calculating the MSE for given lamda
return(MSE)
}
#Called by Validation, calculates the value of MSE for a given lambda
MSEValid.ElasticNet <- function(lambda, x, y, sizes){
testSize <- sizes[1] #size of testing dataset
validSize <- sizes[2] #size of the validation dataset
ytrain = y[1:testSize] #selecting the training data
xtrain = x[1:testSize,]
yvalid = y[(testSize+1):(testSize + validSize)] #selecting the validation set
xvalid = x[(testSize+1):(testSize + validSize),]
BetaHat <- coordDescAlg.ElNet(lambda, ytrain, xtrain) #predicting the value of lambda
MSE <- mean((yvalid - xvalid %*% BetaHat)^2) #calculating the MSE for given lamda
return(MSE)
}
Validation <- function(x,y,sizes){
testSize <- sizes[1]
MinMSE.Lasso <- optim(c(0.0001), MSEValid.Lasso, x = x, y = y, sizes = sizes, method = "Brent", lower = 0, upper = 4) #minimize the MSE to find lamda with lowest MSE for Lasso
BetaHat.Lasso <- coordDescAlg.Lasso(MinMSE.Lasso$par, y[1:testSize], x[1:testSize,])
MinMSE.ElasticNet <- optim(c(0.0001, 0.0001), MSEValid.ElasticNet, x = x, y = y, sizes = sizes,method = 'L-BFGS-B',lower=c(0,0),upper = c(4,1)) #minimize the MSE to find lamda with lowest MSE for Elastic Net
BetaHat.ElasticNet <- coordDescAlg.ElNet(MinMSE.ElasticNet$par, y[1:testSize], x[1:testSize,])
return(list(BetaHat.Lasso,MinMSE.Lasso$value, BetaHat.ElasticNet, MinMSE.ElasticNet$value))
}
```
#### Simulations
The Elastic net dominates the Lasso not only in prediction accuracy but also performs well in variable selection as it aims to overcome the limitations of the Lasso. Therfore, in order to show this in action, we aimed to carry out several different simulations.
In order to carry out the simulation process, first we simulated data (our function ???mySample???). This randomly generates a multivariate sample $\pmb X $ where each column is a standard normal distribution. We use that the correlation between two variables is given by $corr(i,j) = \rho^{|i-j|}$. In the baseline case, we set $\rho = 0.5$. The covariance matrix is equivalent to our correlation matrix in this case since each variable is standard normal distribution (our function ???covMatrix???). Our dependent variable is given by $\pmb y = \pmb{X\Beta} + \sigma\epsilon$ for $\epsilon ~ (0, \pmb I_n)$ where $\epsilon$ is the error, $\sigma$ is its standard deviation. Our baseline case will be $\beta$ = (3, 1.5, 0, 0, 2, 0, 0, 0)^{T} and $\sigma$ = 3. We will generate a sample of 240 observations, where 20 are part of the training set, 20 are part of the validation set, and the remaining are part of the validation set. We do not need to randomly select, since each individual observation is independently identically distributed. We then create a function called ???Testing??? which enables us to run one round of simulation, generating one data set and estimating the training data using our Lasso and Elastic Net functions, optimizing the parameters $\lambda_1$ and $\lambda_2$ using the validation data, and returning the MSE of using the model to estimate the test set, and also returning the number of non-zero coefficients. Finally we generated the 'Simulation' function which would enable us to run multiple simulations at one time and return the number of non-zero coefficients and the MSE of the test data for each respective model. In our baseline case, we will run 50 simulations.
The parameters which we have are as follows, with the baseline values given:
\begin{table}
\caption{Baseline Values}
\centering
\begin{tabular}{c c}
\hline\hline
Paramters & Values \\[0.5ex]
\hline
Number of simulations & 50 \\
Number of observations & 200 \\
Size of Training Set & 20 \\
Size of Validation Set & 20 \\
$\beta$ & (3, 1.5, 0, 0, 2, 0, 0, 0)^{T}\\
$\sigma$ & 3 \\
$\rho$ & 0.5 \\[1ex]
\hline
\end{tabular}
\end{table}
```{r}
Simulation <- function(times, obs, trainsize, validsize, rho, errorSD, betaVals){
Output <- NULL
for(i in 1:times){
Output <-rbind(Output, Testing(n = obs, sizes = c(trainsize, validsize), rho = rho, errorSD = errorSD, betaVals = betaVals))
print(i)
}
return(Output)
}
#Called by Simulation, runs one round of the simulation, returns the number of non-zero coefficients and MSE from optimal lambda that minimizes MSE
Testing <- function(n, sizes, rho, errorSD, betaVals){
SampleOut <- mySample(n, rho, errorSD, betaVals)
x <- SampleOut[[1]]
y <- SampleOut[[2]]
Output <- Validation(x = x, y = y, sizes = sizes)
NumbNonZeros.Lasso <- sum(Output[[1]] != 0) #non-zero coefficients from the Lasso
NumbNonZeros.ElNet <- sum(Output[[3]] != 0) #non-zero coefficients from Elastic Net
MSE.Lasso <- Output[[2]]
MSE.ElNet <- Output[[4]]
return(c(NumbNonZeros.Lasso, MSE.Lasso, NumbNonZeros.ElNet, MSE.ElNet))
}
#Called by Testing, randomly generates the data set
mySample <- function(n, rho, errorSD, betaVals){
p <- dim(betaVals)[1]
SigmaMat <- covMatrix(rho, p)
mu <- rep(0,p)
x <- mvrnorm(n = n, mu = mu, Sigma = SigmaMat)
error <- rnorm(n)
y <- x %*% betaVals + errorSD*error
return(list(x,y))
}
#Called by mySample, creates variance-covariance matrix to measure pairwise correlation
covMatrix <- function(rho, p){
covMatrix <- matrix(rep(0,(p^2)), ncol = p)#initializing covMatrix
for(i in 1:p){
for(j in 1:p){
covMatrix[i,j] <- rho^abs(i-j)
}
}
return(covMatrix)
}
```
#Baseline Results
The baseline results are shown in the table below:
\begin{table}[!htbp] \centering
\caption{}
\label{}
\begin{tabular}{@{\extracolsep{5pt}} ccc}
\\[-1.8ex]\hline
\hline \\[-1.8ex]
& Mean & Var \\
\hline \\[-1.8ex]
NonZeroCoefs.Lasso & $3.560$ & $4.415$ \\
MSE.Lasso & $12.671$ & $25.843$ \\
NonZeroCoefs.ElNet & $5.080$ & $6.034$ \\
MSE.ElNet & $12.119$ & $28.183$ \\
\hline \\[-1.8ex]
\end{tabular}
\end{table}
Our baseline results show how the Elastic Net outperforms Lasso in the variable selection process as we got a lower mean-squared error for our simulations when implementing the Elastic Net as compared to the Lasso. This might be because of the Elastic net's grouped selection property which tends to deal with highly correlated similarly, whether it's removing or keeping them. In addition, the number of non-zero coefficients for elastic net was higher, which is consistent with the format of the penalty.
We are then curious about how each specific parameter might affect the results. Thus, we run simulations on various values of one parameter, while keeping all the others constant. Then we graph the changes in MSE and number of non-zero coefficients as a function of the changing parameter. The graphs will have the mean of the Lasso in red, and the mean of the Elastic Net in blue, with a shaded area of one standard error around the mean. For the most part, the values of the other parameters will be the same as the baseline simulation for consistency.
As we will see in all results, the mean of the MSE for Lasso is generally higher than than of Elastic Net, however this is not always the case, and they are always very close. The difference in the number of non-zero coefficients is more apparent. Lasso is always lower.
##### a. Changing the number of simulations
We used the number of simulations in the range of 0 to 100, in intervals of 5 simulations. The other parameters are all set to the baseline values. The results are shown below:
ADD PLOT
The number of simulations does not appear to have a large effect on either the MSE or the number of non-zero coefficients. While it does make sense that it would not have a large effect, we believed the results would stabilize more at high levels of simulations.
##### b. Changing the number of observations
We used the number of observations in the range of 50 to 1000, in intervals of 25 observations. The other parameters are all set to the baseline values. The results are shown below:
ADD PLOT
The number of observations does not appear to have a large effect on either the MSE or the number of non-zero coefficients. This makes sense because we are still training on the same size of the data-set, which is what the model is based on.
##### c. Changing the size of the training dataset
We used the size of the training set data from 15 to 210 in intervals of 5 observations. The other parameters are all set to the baseline values. The results are shown below:
ADD PLOT
The MSE of both drop off at the start as the size of the training set increases, but then it levels off after a while. This makes sense as a larger training set will mean that we can better estimate the true model, and will minimize the effects of any outliers. If most of the observations in a small training set happen to be similar but not representative of the overall distribution, the estimation will be skewed. However, as the training size increases, the law of large numbers means that this is less likely. The number of non-zero coefficients increases as the size of the training set increases, but again this levels off after a while. This is probably for similar reasons.
##### d. Changing the size of the validation dataset
We used the size of the validation set data from 15 to 210 in intervals of 5 observations. The other parameters are all set to the baseline values. The results are shown below:
The size of the validation set does not appear to have a large effect on either the MSE or the number of non-zero coefficients. This makes sense because we are still training on the same size of the data-set, which is what the model is based on.
##### e. Changing sparsity levels of Beta
We used the following $\beta$ values: $\beta$ = (3, 1.5, -2, 4, 2, -1.3, 6, -3)??? where each successive iteration set one additional value of beta to 0 to increase the sparsity starting with the last. For example, our second run used $\beta$ = (3, 1.5, -2, 4, 2, -1.3, 6, 0)???, our third run used $\beta$ = (3, 1.5, -2, 4, 2, -1.3, 0, 0)???, etc. For most of the other parameters, we used the baseline values, with the exception of $\rho$. To prevent the correlation between variables from affecting the results, we set this as 0 so that each variable is independent. The results are shown below.
For low sparsity levels, the MSE of Lasso is much higher than Elastic Net, but this goes down as the sparsity increases until they are essentially the same when $\beta$ = (0,0,0,0,0,0,0,0)???. In addition, the standard error of the MSE of lasso is very large at low sparsity levels, and slowly decreases to being much lower, which makes sense because of the penalty term where lasso estimates high numbers of 0 coefficients. The penalty term for elastic net in contrast does not estimate as many 0 coefficients. Elastic Net seems to be a lot more stable for various sparsity levels, and definitely outperforms Lasso at low sparsity levels. At high sparsity levels, they are very similar. For the number of non-zero coefficients, elastic net is very close to 0 non-zero coefficients at low sparsity levels as it should be. In addition, the standard error is very small. As the sparsity level increases, the number of non-zero coefficients decreases as it should, though the standard error increases. In contrast, at low sparsity levels the Lasso has lower numbers of non-zero coefficients compared to elastic net which is not good, and also results in extremely high standard errors. As the sparsity level increases, the number of non-zero coefficients goes to 0 as it should, and the standard error greatly decreases.
##### f. Changing sigma
We used $\sigma$ values ranging from -10 to 10 in intervals of 0.5. The other parameters were set to the baseline values. The results are shown below.
ADD PLOT
The change in $\sigma$ seemed to have fairly similar effects on the MSE of both models, where when $|\sigma|$ is high, they have larger MSE???s with higher standard errors as expected since the error will have a larger effect on $\pmb y$. The two models are very close, though Elastic Net seems to actually have a slightly higher MSE. As $|\sigma|$ gets closer to 0, the MSE and their standard errors both decrease in an almost parabolic shape, and with Elastic Net having the lower MSE.
For the number of non-zero coefficients, this is lower at high levels of $|\sigma|$, though both are very similar to each other. As the value of $|\sigma|$ decreases, the number of non-zero coefficients increases, though for elastic net far more than lasso. In addition, the standard errors decrease for elastic net closer to $\sigma = 0$, but not for lasso.
##### g. Changing Rho
For $\rho$, we used values ranging from -0.95 to 0.95 in intervals of 0.05. All the other parameters are set to the baseline values. The results are shown below.
ADD PLOT
There does not appear to be a very obvious pattern in the MSE???s. At very high levels of $|\rho|$, the MSE does appear to decrease slightly for both models, though a bit more so for Lasso in comparison to Elastic Net. For the number of non-zero coefficients, this again does not appear to have a very obvious pattern, though the mean of both increase as $\rho$ increases and the standard error decreases a bit for both as $\rho$ increases. For elastic net, this might be because of the grouped selection property mentioned above.
#### Conclusion
In general, Elastic Net seems to have a lower mean squared error compared to Lasso, and Lasso tends to estimate higher numbers of $\beta$ coefficients estimated as zero. However, this can change depending on the circumstance as we show in our simulations where we change the parameters one at a time and show how this affects the MSE and the number of non-zero coefficients. A number of the parameters did not change the outcome much such as the number of simulations, the number of observations, and the size of the validation set. The value of $\rho$ had a slight impact on MSE and the number of non-zero $\beta$ coefficients, but not much. The parameters that had the largest effect were the size of the training data set, the sparsity levels of $\beta$, and the value of $\sigma$.
Unfortunately, a lot of these parameters are unknown in the real world, so we will almost never know which situation we are in, and thus which model would be better. However, we can always increase the size of the training set compared to the size of the validation and test sets which improves the model accuracy for both the algorithms. We could estimate the variance-covariance matrix so we have some idea of how the variables are correlated with each other. However, the true sparsity level of $\beta$ and the value of $\sigma$ will never be known unless it is a simulated data set such as this.
If we interested in decreasing the number of variables, lasso tends to be a better choice, and if we are interested in model accuracy or preserving the effects of more variables to analyse then elastic net tends to be better.
\newpage
####References
* J. R. Statist. Soc. B (2005) 67, Part 2, pp. 301???320, Regularization and variable selection via the elastic net)
* ISLR Chapter 6
* Friedman, J., Hastie, T. and Hofling, H. (2007). Pathwise coordinate optimization, TheAnnals of Applied Statistics 1: 302???332.
* Tibshirani, R. (1996). Regression shrinkage and selection via the lasso, Journal of Royal Statistical Society, Series B 58: 267???288.
* Tseng, P. (1988). Coordinate ascent for maximizing nondifferentiable concave functions,Technical Report .
* Tseng, P. (2001). Convergence of block coordinate descent method for nondifferentiable maximiation, J. Opt. Theory Appl. 109: 474???494.
* https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2929880/#R5
\newpage
####Appendix
```{r}
#Called by the Validation function. It calculates the value of the MSE for a given
coordDescAlg.Lasso <- function(lambda, y, x){
x <- as.matrix(x)
converge <- FALSE #initialising coverge as False
n <- dim(x)[1] #number of observations for each variable is equal to the no. of rows
p <- dim(x)[2] #number of variables is equal to the number of columns
BetaHat <- rep(0, p) #initialising BetaHat
BetaHat.star <- rep(0, p) #initialising BetaHat.Star
newBetaHat <- rep(0,p) #initialising newBetaHat
r <- matrix(rep(0,n*p), ncol = p) #creating the residual matrix
while(!converge){
for(j in 1:p){
r[,j] <- y - x[,-j]%*%BetaHat[-j] #computing partial residuals
BetaHat.star[j] <- (t(x[,j])%*% r[,j]) / (t(x[,j])%*% x[,j]) #computing least square coefficients of the residuals
newBetaHat[j] <- sign(BetaHat.star[j])*pmax(abs(BetaHat.star[j]) - n*(lambda / (t(x[,j])%*% x[,j])), 0) #updating newBetaHat by soft thresholding
if(all(round(BetaHat, 5) == round(newBetaHat,5))){ #repeating loop until convergence
converge <- TRUE
break
} else {
BetaHat <- newBetaHat
}
}
}
return(BetaHat)
}
coordDescAlg.ElNet <- function(lambda, y, x){
#lamda has two dimensions
x <- as.matrix(x)
converge <- FALSE #initialising coverge as False
n <- dim(x)[1] #number of observations for each variable is equal to the no. of rows
p <- dim(x)[2] #total number of variables
BetaHat <- rep(0, p) #initialising BetaHat
BetaHat.star <- rep(0, p) #initialising BetaHat.Star
newBetaHat <- rep(0,p) #initialising newBetaHat
r <- matrix(rep(0,n*p), ncol = p) #creating the residual matrix
while(!converge){
for(j in 1:p){
r[,j] <- y - x[,-j]%*%BetaHat[-j] #computing partial residuals
BetaHat.star[j] <- (t(x[,j])%*% r[,j]) / (t(x[,j])%*% x[,j]) #computing least square coefficients of the residuals
newBetaHat[j] <- (sign(BetaHat.star[j])*pmax(abs(BetaHat.star[j]) - ((n*lambda[1] / (t(x[,j])%*% x[,j]))/(1+2*lambda[2])),0)) #updating newBetaHat by soft thresholding
if(all(round(BetaHat, 5) == round(newBetaHat,5))){ #repeating loop until convergence
converge <- TRUE
break
} else {
BetaHat <- newBetaHat
}
}
}
return(BetaHat)
}
Simulation <- function(times, obs, trainsize, validsize, rho, errorSD, betaVals){
Output <- NULL
for(i in 1:times){
Output <-rbind(Output, Testing(n = obs, sizes = c(trainsize, validsize), rho = rho, errorSD = errorSD, betaVals = betaVals))
print(i)
}
return(Output)
}
#Called by Simulation, runs one round of the simulation, returns the number of non-zero coefficients and MSE from optimal lambda that minimizes MSE
Testing <- function(n, sizes, rho, errorSD, betaVals){
SampleOut <- mySample(n, rho, errorSD, betaVals)
x <- SampleOut[[1]]
y <- SampleOut[[2]]
Output <- Validation(x = x, y = y, sizes = sizes)
NumbNonZeros.Lasso <- sum(Output[[1]] != 0) #non-zero coefficients from the Lasso
NumbNonZeros.ElNet <- sum(Output[[3]] != 0) #non-zero coefficients from Elastic Net
MSE.Lasso <- Output[[2]]
MSE.ElNet <- Output[[4]]
return(c(NumbNonZeros.Lasso, MSE.Lasso, NumbNonZeros.ElNet, MSE.ElNet))
}
#Called by Testing, randomly generates the data set
mySample <- function(n, rho, errorSD, betaVals){
p <- dim(betaVals)[1]
SigmaMat <- covMatrix(rho, p)
mu <- rep(0,p)
x <- mvrnorm(n = n, mu = mu, Sigma = SigmaMat)
error <- rnorm(n)
y <- x %*% betaVals + errorSD*error
return(list(x,y))
}
#Called by mySample, creates variance-covariance matrix to measure pairwise correlation
covMatrix <- function(rho, p){
covMatrix <- matrix(rep(0,(p^2)), ncol = p)#initializing covMatrix
for(i in 1:p){
for(j in 1:p){
covMatrix[i,j] <- rho^abs(i-j)
}
}
return(covMatrix)
}
Validation <- function(x,y,sizes){
testSize <- sizes[1]
MinMSE.Lasso <- optim(c(0.0001), MSEValid.Lasso, x = x, y = y, sizes = sizes, method = "Brent", lower = 0, upper = 4) #minimize the MSE to find lamda with lowest MSE for Lasso
BetaHat.Lasso <- coordDescAlg.Lasso(MinMSE.Lasso$par, y[1:testSize], x[1:testSize,])
MinMSE.ElasticNet <- optim(c(0.0001, 0.0001), MSEValid.ElasticNet, x = x, y = y, sizes = sizes,method = 'L-BFGS-B',lower=c(0,0),upper = c(4,1)) #minimize the MSE to find lamda with lowest MSE for Elastic Net
BetaHat.ElasticNet <- coordDescAlg.ElNet(MinMSE.ElasticNet$par, y[1:testSize], x[1:testSize,])
return(list(BetaHat.Lasso,MinMSE.Lasso$value, BetaHat.ElasticNet, MinMSE.ElasticNet$value))
}
MSEValid.Lasso <- function(lambda, x, y, sizes){
testSize <- sizes[1] #size of the testing dataset
validSize <- sizes[2] #size of the validation dataset
ytrain = y[1:testSize] #selecting the training data
xtrain = x[1:testSize,]
yvalid = y[(testSize+1):(testSize + validSize)] #selecting the validation set
xvalid = x[(testSize+1):(testSize + validSize),]
BetaHat <- coordDescAlg.Lasso(lambda, ytrain, xtrain) #predicting the value of lambda
MSE <- mean((yvalid - xvalid %*% BetaHat)^2) #calculating the MSE for given lamda
return(MSE)
}
#Called by Validation, calculates the value of MSE for a given lambda
MSEValid.ElasticNet <- function(lambda, x, y, sizes){
testSize <- sizes[1] #size of testing dataset
validSize <- sizes[2] #size of the validation dataset
ytrain = y[1:testSize] #selecting the training data
xtrain = x[1:testSize,]
yvalid = y[(testSize+1):(testSize + validSize)] #selecting the validation set
xvalid = x[(testSize+1):(testSize + validSize),]
BetaHat <- coordDescAlg.ElNet(lambda, ytrain, xtrain) #predicting the value of lambda
MSE <- mean((yvalid - xvalid %*% BetaHat)^2) #calculating the MSE for given lamda
return(MSE)
}
#Baseline
output <- Simulation(times = 50, obs = 240, trainsize = 20, validsize = 20, rho = 0.5, errorSD = 3, betaVals = matrix(c(3,1.5,0,0,2,0,0,0), ncol = 1))
mysummary <- cbind(colMeans(output),apply(output, 2, var))
rownames(mysummary) <- c("NonZeroCoefs.Lasso", "MSE.Lasso", "NonZeroCoefs.ElNet", "MSE.ElNet")
colnames(mysummary) <- c("Mean", "Var")
mysummary
meanAndvar <- function(x){
cbind(colMeans(x),apply(x, 2, var))
}
#Change number of simulations
NSims <- seq(5,100,5)
outputSims <- lapply(NSims, Simulation, obs = 240, trainsize = 20, validsize = 20, rho = 0.5, errorSD = 3, betaVals = matrix(c(3,1.5,0,0,2,0,0,0), ncol = 1))
mysummary <- lapply(outputSims, meanAndvar)
NonZeroCoefsSummary.Lasso <- data.frame("NSims" = NSims, Mod = "Lasso", "Mean" = sapply(mysummary, "[[", 1,1), "Var" = sapply(mysummary, "[[", 1,2))
NonZeroCoefsSummary.ElNet <- data.frame("NSims" = NSims, Mod = "ElNet", "Mean" = sapply(mysummary, "[[", 3,1), "Var" = sapply(mysummary, "[[", 3,2))
NonZeroCoefsSummary <- rbind(NonZeroCoefsSummary.Lasso, NonZeroCoefsSummary.ElNet)
MSEsummary.Lasso <- data.frame("NSims" = NSims, Mod = "Lasso", "Mean" = sapply(mysummary, "[[", 2,1), "Var" = sapply(mysummary, "[[", 2,2))
MSEsummary.ElNet <- data.frame("NSims" = NSims, Mod = "ElNet", "Mean" = sapply(mysummary, "[[", 4,1), "Var" = sapply(mysummary, "[[", 4,2))
MSEsummary <- rbind(MSEsummary.Lasso, MSEsummary.ElNet)
ggplot(MSEsummary) + ggtitle("MSE by Number of Simulations") +
geom_line(aes(x = NSims, y = Mean, color = Mod)) +
geom_ribbon(aes(x= NSims, ymin = Mean - sqrt(Var), ymax = Mean + sqrt(Var), fill = Mod), alpha = 0.3)
ggplot(NonZeroCoefsSummary) + ggtitle("Non-Zero Coefficients by Number of Simulations") +
geom_line(aes(x = NSims, y = Mean, color = Mod)) +
geom_ribbon(aes(x= NSims, ymin = Mean - sqrt(Var), ymax = Mean + sqrt(Var), fill = Mod), alpha = 0.3)
#Change number of observations
Nobs <- seq(50,1000,25)
outputNobs <- lapply(Nobs, Simulation, times = 50, trainsize = 20, validsize = 20, rho = 0.5, errorSD = 3, betaVals = matrix(c(3,1.5,0,0,2,0,0,0), ncol = 1))
mysummary <- lapply(outputNobs, meanAndvar)
NonZeroCoefsSummary.Lasso <- data.frame("Nobs" = Nobs, Mod = "Lasso", "Mean" = sapply(mysummary, "[[", 1,1), "Var" = sapply(mysummary, "[[", 1,2))
NonZeroCoefsSummary.ElNet <- data.frame("Nobs" = Nobs, Mod = "ElNet", "Mean" = sapply(mysummary, "[[", 3,1), "Var" = sapply(mysummary, "[[", 3,2))
NonZeroCoefsSummary <- rbind(NonZeroCoefsSummary.Lasso, NonZeroCoefsSummary.ElNet)
MSEsummary.Lasso <- data.frame("Nobs" = Nobs, Mod = "Lasso", "Mean" = sapply(mysummary, "[[", 2,1), "Var" = sapply(mysummary, "[[", 2,2))
MSEsummary.ElNet <- data.frame("Nobs" = Nobs, Mod = "ElNet", "Mean" = sapply(mysummary, "[[", 4,1), "Var" = sapply(mysummary, "[[", 4,2))
MSEsummary <- rbind(MSEsummary.Lasso, MSEsummary.ElNet)
ggplot(MSEsummary) + ggtitle("MSE by Number of Observations") +
geom_line(aes(x = Nobs, y = Mean, color = Mod)) +
geom_ribbon(aes(x= Nobs, ymin = Mean - sqrt(Var), ymax = Mean + sqrt(Var), fill = Mod), alpha = 0.3)
ggplot(NonZeroCoefsSummary) + ggtitle("Non-Zero Coefficients by Number of Observations") +
geom_line(aes(x = Nobs, y = Mean, color = Mod)) +
geom_ribbon(aes(x= Nobs, ymin = Mean - sqrt(Var), ymax = Mean + sqrt(Var), fill = Mod), alpha = 0.3)
#Change size of training data
TrainNumb <- seq(15,210,5)
outputTrainNumb <- lapply(TrainNumb, Simulation, times = 50, obs = 240, validsize = 20, rho = 0.5, errorSD = 3, betaVals = matrix(c(3,1.5,0,0,2,0,0,0), ncol = 1))
mysummary <- lapply(outputTrainNumb, meanAndvar)
NonZeroCoefsSummary.Lasso <- data.frame("TrainNumb" = TrainNumb, Mod = "Lasso", "Mean" = sapply(mysummary, "[[", 1,1), "Var" = sapply(mysummary, "[[", 1,2))
NonZeroCoefsSummary.ElNet <- data.frame("TrainNumb" = TrainNumb, Mod = "ElNet", "Mean" = sapply(mysummary, "[[", 3,1), "Var" = sapply(mysummary, "[[", 3,2))
NonZeroCoefsSummary <- rbind(NonZeroCoefsSummary.Lasso, NonZeroCoefsSummary.ElNet)
MSEsummary.Lasso <- data.frame("TrainNumb" = TrainNumb, Mod = "Lasso", "Mean" = sapply(mysummary, "[[", 2,1), "Var" = sapply(mysummary, "[[", 2,2))
MSEsummary.ElNet <- data.frame("TrainNumb" = TrainNumb, Mod = "ElNet", "Mean" = sapply(mysummary, "[[", 4,1), "Var" = sapply(mysummary, "[[", 4,2))
MSEsummary <- rbind(MSEsummary.Lasso, MSEsummary.ElNet)
ggplot(MSEsummary) + ggtitle("MSE by Size of Training Set") +
geom_line(aes(x = TrainNumb, y = Mean, color = Mod)) +
geom_ribbon(aes(x= TrainNumb, ymin = Mean - sqrt(Var), ymax = Mean + sqrt(Var), fill = Mod), alpha = 0.3)
ggplot(NonZeroCoefsSummary) + ggtitle("Non-Zero Coefficients by Size of Training Set") +
geom_line(aes(x = TrainNumb, y = Mean, color = Mod)) +
geom_ribbon(aes(x= TrainNumb, ymin = Mean - sqrt(Var), ymax = Mean + sqrt(Var), fill = Mod), alpha = 0.3)
#Change size of validation data
VaidNumb <- seq(15,210,5)
outputValidNumb <- lapply(ValidNumb, Simulation, times = 50, obs = 240, trainsize = 20, rho = 0.5, errorSD = 3, betaVals = matrix(c(3,1.5,0,0,2,0,0,0), ncol = 1))
mysummary <- lapply(outputValidNumb, meanAndvar)
NonZeroCoefsSummary.Lasso <- data.frame("ValidNumb" = ValidNumb, Mod = "Lasso", "Mean" = sapply(mysummary, "[[", 1,1), "Var" = sapply(mysummary, "[[", 1,2))
NonZeroCoefsSummary.ElNet <- data.frame("ValidNumb" = ValidNumb, Mod = "ElNet", "Mean" = sapply(mysummary, "[[", 3,1), "Var" = sapply(mysummary, "[[", 3,2))
NonZeroCoefsSummary <- rbind(NonZeroCoefsSummary.Lasso, NonZeroCoefsSummary.ElNet)
MSEsummary.Lasso <- data.frame("ValidNumb" = ValidNumb, Mod = "Lasso", "Mean" = sapply(mysummary, "[[", 2,1), "Var" = sapply(mysummary, "[[", 2,2))
MSEsummary.ElNet <- data.frame("ValidNumb" = ValidNumb, Mod = "ElNet", "Mean" = sapply(mysummary, "[[", 4,1), "Var" = sapply(mysummary, "[[", 4,2))
MSEsummary <- rbind(MSEsummary.Lasso, MSEsummary.ElNet)
ggplot(MSEsummary) + ggtitle("MSE by Size of Validation Set") +
geom_line(aes(x = ValidNumb, y = Mean, color = Mod)) +
geom_ribbon(aes(x= ValidNumb, ymin = Mean - sqrt(Var), ymax = Mean + sqrt(Var), fill = Mod), alpha = 0.3)
ggplot(NonZeroCoefsSummary) + ggtitle("Non-Zero Coefficients by Size of Validation Set") +
geom_line(aes(x = ValidNumb, y = Mean, color = Mod)) +
geom_ribbon(aes(x= ValidNumb, ymin = Mean - sqrt(Var), ymax = Mean + sqrt(Var), fill = Mod), alpha = 0.3)
#Change sparsity levels of beta
BetaVals.Sparsity <- list(matrix(c(0,0,0,0,0,0,0,0), ncol = 1),
matrix(c(3,0,0,0,0,0,0,0), ncol = 1),
matrix(c(3,1.5,0,0,0,0,0,0), ncol = 1),
matrix(c(3,1.5,-2,0,0,0,0,0), ncol = 1),
matrix(c(3,1.5,-2,4,0,0,0,0), ncol = 1),
matrix(c(3,1.5,-2,4,2,0,0,0), ncol = 1),
matrix(c(3,1.5,-2,4,2,-1.3,0,0), ncol = 1),
matrix(c(3,1.5,-2,4,2,-1.3,6,0), ncol = 1),
matrix(c(3,1.5,-2,4,2,-1.3,6,-3), ncol = 1))
SparsityLevels <- seq(1,0, by = -1/8)
outputSparsity <- lapply(BetaVals.Sparsity, Simulation, times = 20, obs = 240, trainsize = 20, validsize = 20, rho = 0, errorSD = 3)
mysummary <- lapply(outputSparsity, meanAndvar)
NonZeroCoefsSummary.Lasso <- data.frame("SparsityLevels" = SparsityLevels, Mod = "Lasso", "Mean" = sapply(mysummary, "[[", 1,1), "Var" = sapply(mysummary, "[[", 1,2))
NonZeroCoefsSummary.ElNet <- data.frame("SparsityLevels" = SparsityLevels, Mod = "ElNet", "Mean" = sapply(mysummary, "[[", 3,1), "Var" = sapply(mysummary, "[[", 3,2))
NonZeroCoefsSummary <- rbind(NonZeroCoefsSummary.Lasso, NonZeroCoefsSummary.ElNet)
MSEsummary.Lasso <- data.frame("SparsityLevels" = SparsityLevels, Mod = "Lasso", "Mean" = sapply(mysummary, "[[", 2,1), "Var" = sapply(mysummary, "[[", 2,2))
MSEsummary.ElNet <- data.frame("SparsityLevels" = SparsityLevels, Mod = "ElNet", "Mean" = sapply(mysummary, "[[", 4,1), "Var" = sapply(mysummary, "[[", 4,2))
MSEsummary <- rbind(MSEsummary.Lasso, MSEsummary.ElNet)
ggplot(MSEsummary) + ggtitle("MSE by Sparsity Levels") +
geom_line(aes(x = SparsityLevels, y = Mean, color = Mod)) +
geom_ribbon(aes(x= SparsityLevels, ymin = Mean - sqrt(Var), ymax = Mean + sqrt(Var), fill = Mod), alpha = 0.3)
ggplot(NonZeroCoefsSummary) + ggtitle("Non-Zero Coefficients by Sparsity Levels") +
geom_line(aes(x = SparsityLevels, y = Mean, color = Mod)) +
geom_ribbon(aes(x= SparsityLevels, ymin = Mean - sqrt(Var), ymax = Mean + sqrt(Var), fill = Mod), alpha = 0.3)
#Change rho
RhoVals <- seq(-.95, .95, .05)
outputrho <- lapply(RhoVals, Simulation, times = 50, obs = 240, trainsize = 20, validsize = 20, errorSD = 3, betaVals = matrix(c(3,1.5,0,0,2,0,0,0), ncol = 1))
mysummary <- lapply(outputrho, meanAndvar)
NonZeroCoefsSummary.Lasso <- data.frame("RhoVals" = RhoVals, Mod = "Lasso", "Mean" = sapply(mysummary, "[[", 1,1), "Var" = sapply(mysummary, "[[", 1,2))
NonZeroCoefsSummary.ElNet <- data.frame("RhoVals" = RhoVals, Mod = "ElNet", "Mean" = sapply(mysummary, "[[", 3,1), "Var" = sapply(mysummary, "[[", 3,2))
NonZeroCoefsSummary <- rbind(NonZeroCoefsSummary.Lasso, NonZeroCoefsSummary.ElNet)
MSEsummary.Lasso <- data.frame("RhoVals" = RhoVals, Mod = "Lasso", "Mean" = sapply(mysummary, "[[", 2,1), "Var" = sapply(mysummary, "[[", 2,2))
MSEsummary.ElNet <- data.frame("RhoVals" = RhoVals, Mod = "ElNet", "Mean" = sapply(mysummary, "[[", 4,1), "Var" = sapply(mysummary, "[[", 4,2))
MSEsummary <- rbind(MSEsummary.Lasso, MSEsummary.ElNet)
ggplot(MSEsummary) + ggtitle("MSE by Correlation between Variables") +
geom_line(aes(x = RhoVals, y = Mean, color = Mod)) +
geom_ribbon(aes(x= RhoVals, ymin = Mean - sqrt(Var), ymax = Mean + sqrt(Var), fill = Mod), alpha = 0.3)
ggplot(NonZeroCoefsSummary) + ggtitle("Non-Zero Coefficients by Correlation between Variables") +
geom_line(aes(x = RhoVals, y = Mean, color = Mod)) +
geom_ribbon(aes(x= RhoVals, ymin = Mean - sqrt(Var), ymax = Mean + sqrt(Var), fill = Mod), alpha = 0.3)
#Change sigma
SigVals <- seq(-10, 10, .5)
outputsigma <- lapply(SigVals, Simulation, times = 50, obs = 240, trainsize = 20, validsize = 20, rho = 0.5, betaVals = matrix(c(3,1.5,0,0,2,0,0,0), ncol = 1))
mysummary <- lapply(outputsigma, meanAndvar)
NonZeroCoefsSummary.Lasso <- data.frame("SigVals" = SigVals, Mod = "Lasso", "Mean" = sapply(mysummary, "[[", 1,1), "Var" = sapply(mysummary, "[[", 1,2))
NonZeroCoefsSummary.ElNet <- data.frame("SigVals" = SigVals, Mod = "ElNet", "Mean" = sapply(mysummary, "[[", 3,1), "Var" = sapply(mysummary, "[[", 3,2))
NonZeroCoefsSummary <- rbind(NonZeroCoefsSummary.Lasso, NonZeroCoefsSummary.ElNet)
MSEsummary.Lasso <- data.frame("SigVals" = SigVals, Mod = "Lasso", "Mean" = sapply(mysummary, "[[", 2,1), "Var" = sapply(mysummary, "[[", 2,2))
MSEsummary.ElNet <- data.frame("SigVals" = SigVals, Mod = "ElNet", "Mean" = sapply(mysummary, "[[", 4,1), "Var" = sapply(mysummary, "[[", 4,2))
MSEsummary <- rbind(MSEsummary.Lasso, MSEsummary.ElNet)
ggplot(MSEsummary) + ggtitle("MSE by Standard Deviation of Error") +
geom_line(aes(x = SigVals, y = Mean, color = Mod)) +
geom_ribbon(aes(x= SigVals, ymin = Mean - sqrt(Var), ymax = Mean + sqrt(Var), fill = Mod), alpha = 0.3)
ggplot(NonZeroCoefsSummary) + ggtitle("Non-Zero Coefficients by Standard Deviation of Error") +
geom_line(aes(x = SigVals, y = Mean, color = Mod)) +
geom_ribbon(aes(x= SigVals, ymin = Mean - sqrt(Var), ymax = Mean + sqrt(Var), fill = Mod), alpha = 0.3)
```
## Including Plots
#You can also embed plots, for example:
#```{r pressure, echo=FALSE}
#plot(pressure)
#```
#Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.