09_variable_selection.Rmd

# Application 1: Variable selection {#ch-08}

## Introduction

In Chapter \@ref(ch-07), we noted that the coefficient of determination $R^2$ may increase even when uninformative predictors are added to a model. This will ascribe some predictive power to an uninformative predictor that is in fact misguided by its (random) correlation with the target variable. Often, we start out formulating models, not knowing beforehand what predictors should be considered and we are tempted to use them all because the full model will always yield the best $R^2$. In such cases, we're prone to building overconfident models that perform well on the training set, but will not perform well when predicting to new data. 

In this application session, we'll implement an algorithm that sequentially searches the best additional predictor to be included in our model, starting from a single one. This is called *stepwise-forward* regression (see definition below). There is also *stepwise-backward* regression where predictors are sequentially removed from a model that includes them all. The challenge is that we often lack the possibility to confidently assess generalisability. The effect of spuriously increasing $R^2$ by adding uninformative predictors can be mitigated, as we noted in Chapter \@ref(ch-07), by considering alternative metrics that penalize the number of predictors in a model. They balance the tradeoff between model complexity (number of variables in the linear regression case) and goodness of fit. Such metrics include the *adjusted-*$R^2$,the Akaike Information Criterion (AIC), or the Bayesian Information Criterion (BIC). In cases, where sufficient data is available, also cross-validation can be used for assessing the generalisability of alternative models. Here, we'll assess how these different metrics behave for a sequence of linear regression models with an increasing number of predictors. You'll learn to write code that implements an algorithm determining the order in which variables enter the model, starting from one and going up to fourteen predictors. You'll write your own stepwise-forward regression code.

Let's get started!

## Application
### Warm-up 1: Nested for-loop

Given a matrix A and a vector B (see below), do the following tasks:

- Replace the missing values (`NA`) in the first row of A by the largest value of B. After using that element of B for imputing A, drop that element from the vector B and proceed with imputing the second row of A, using the (now) largest value of the updated vector B, and drop that element from B after using it for imputing A. Repeat the same procedure for all four rows in A.

- After imputing (replacing) in each step, calculate the mean of the remaining values in B and record it as a single-row data frame with two columns `row_number` and `avg`, where `row_number` is the row number of A where the value was imputed, and `avg` is the mean of remaining values in B. As the algorithm proceeds through rows in A, sequentially bind the single-row data frame together so that after completion of the algorithm, the data frame contains four rows (corresponding to the number of rows in A).

```{r}
A <- matrix(c(6, 7, 3, NA, 15, 6, 7, 
              8, 9, 12, 6, 11, NA, 3, 
              9, 4, 7, 3, 21, NA, 6, 
              7, 19, 6, NA, 15, 8, 10),
            nrow = 4, byrow = TRUE)
B <- c(8, 4, 12, 9, 15, 6)
```

Before implementing these tasks, try to write down a pseudo code. This is code-like text that may not be executable, but describes the structure of real code and details where and how major steps are implemented. Next, you'll need to write actual R code. For this, you will need to find answers to the following questions: 

+ How to go through each of the element in matrix?
+ How to detect NA value?
+ How to drop an element of a given value from a vector?
+ How to add a row to an existing data frame?

**Solution**:
```{r message=FALSE}
library(tidyverse)

summ <- data.frame()

for (i in 1:nrow(A)){
  
  for (j in 1:ncol(A)){
    
    if (is.na(A[i,j])){
      A[i,j] <- max(B)
    }
  }
  

  B <- B[-which(B == max(B))] # update the B vector removing the biggest values
  summ <- bind_rows(summ, data.frame(row_number = i, avg = mean(B)))
}


summ
```


### Warm-up 2: Find the best single predictor

**The math behind forward stepwise regression:**

1. Let $\mathcal{M_0}$ denote the null model, which contains no predictors.

2. For $k=0,..,p-1$:
   (a) Consider all $p − k$ models that augment $\mathcal{M}_k$ with one additional predictor. 
   (b) Choose the best model among these $p − k$ models, and call it $\mathcal{M}_{k+1}$. Here _best_ is defined as having the highest $R^2$ .

3. Select a single best model from among $\mathcal{M}_0$, . . . , $\mathcal{M}_p$ using cross-validated prediction error, AIC, BIC, or adjusted $R^2$.


The first step of a stepwise forward regression is to find the single most powerful predictor in a univariate linear regression model for the target variable `GPP_NT_VUT_REF` among all fourteen available predictors in our data set (all except those of type `date` or `character`). Implement this first part of the search, using the definition of the stepwise-forward algorithm above. Remove all rows with at least one missing value before starting the predictor search.

- Which predictor achieves the highest $R^2$? 
- What value is the $R^2$?
- Visualise $R^2$ for all univariate models, ordered by their respective $R^2$ values.
- Do you note a particular pattern? Which variables yield similar $R^2$? How do you expect them to be included in multivariate models of the subsequent steps in the stepwise forward regression?

_Hints_: 

+ Model structure: 
   
   - The "counter" variables in the for loop can be provided as a vector, and the counter will sequentially take on the value of each element in that vector. For example: `for (var in all_predictors){ ... }`.


+ Algorithm:

   - To record $R^2$ values for the different models, you may start by creating an empty vector (`vec <- c()`) before the loop and then sequentially add elements to that vector inside the loop (`vec <- c(vec, new_element)`). Alternatively, you can do something similar, but with a data frame (initialising with `df_rsq <- data.frame()` before the loop, and adding rows by `df_rsq <- bind_rows(df_rsq, data.frame(pred = predictor_name, rsq = rsq_result))` inside the loop).

   - A clever way how to construct formulas dynamically is described, for example in [this stackoverflow post](https://stackoverflow.com/questions/4951442/formula-with-dynamic-number-of-variables).
   
+ Value retrieving: 

   - Extract the $R^2$ from the linear model object: `summary(fit_lin)[["r.squared"]]`

+ Visualising:

   - Search for solutions for how to change the order of levels to be plotted yourself.
   
```{r message=F, warning=F}
library(tidyverse)
## read CSV and drop missing values in one step
df <- read_csv("./data/ddf_for_08_application.csv") %>% 
  drop_na()

## specify target variable
target <- 'GPP_NT_VUT_REF'

## determine predictors as all except site ID, timestamp, and the target (should be 14)
preds <- df %>% 
  dplyr::select(-siteid, -TIMESTAMP, -GPP_NT_VUT_REF) %>% 
  names()

## initialise an empty data frame (necessary, because otherwise we cannot use bind_rows() below)
df_rsq <- data.frame()
# rsq_list <- c()  # alternative for vector

for (var in preds){
  
  ## create formula dynamically
  forml <- as.formula(paste(target, "~", var))
  
  ## fit linear model
  fit_lin <- lm(forml, data = df)
  
  ## extract R2 from linear model
  rsq <- summary(fit_lin)[["r.squared"]]
  
  ## add a row to the data frame that holds the results
  df_rsq <- bind_rows(df_rsq, data.frame(pred = var, rsq = rsq))
  
  # rsq_list <- c(rsq_list,rsq)  # alternative with vector
}

## print best single predictor and its R2
df_rsq %>% 
  arrange(-rsq) %>%  # arrange by R2 in descending order (highest R2 on top)
  slice(1)           # print only the first row (with highest R2)


```
```{r}
library(ggplot2)

## alternative: determine the first variable to enter into our model
# preds[which.max(rsq_list)]

## use the data frame that holds the results for plotting
df_rsq %>% 
  ggplot(aes(x = reorder(pred, rsq), y = rsq)) +
  geom_bar(stat = "identity") + 
  labs(y = expression(italic(R)^2), x = "Variable") +
  coord_flip()
```


**Solution**:

*`PPFD_IN` achieves the highest $R^2$ value, and the corresponding $R^2$ values is 0.5276123.* 

*We could see from the above plot that `LW_IN_F_MDS` and `LW_IN_F`, `VPD_F` and `VPD_F_MDS`, `SW_IN_F` and `SW_IN_F_MDS` share basically the same $R^2$ values. Those variables are highly correlated and are explaining the same amount of variance in the target variable. In the subsequent steps of the stepwise forward regression, since each of the two variables are highly correlated with each other, it's likely that only one of them will be selected into our final model.*

### Full stepwise regression

Now, we take it to the next level and implement a full stepwise forward regression as described above. For each step (number of predictors $k$), record the following metrics: $R^2$, *adjusted-*$R^2$, the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and the 5-fold cross-validation $R^2$ and RMSE.

- Write pseudo-code for how you plan to implement the algorithm first.

- Implement the algorithm in R, run it and display the order in which predictors enter the model.

- Display a table with the metrics of all $k$ steps, and the single variable, added at each step.

_Hints_: 

+ Model structure:
   
   - Recall what you learned in the breakout session, you may use the same idea on this task. Try to think of the blueprint (*pseudo-code*) first: How to go through different models in each forward step? How to store predictors added to the model and how to update candidate predictors? 

+ Algorithm:

   - A complication is that the set of predictors is sequentially complemented at each step of the search through $k$. You may again use `vec <- list()` to create an empty vector, and then add elements to that vector by `vec <- c(vec, new_element)`.

    - It may be helpful to explicitly define a set of "candidate predictors" that may potentially be added to the model as a vector (e.g., `preds_candidate`), and define predictors retained in the model from the previous step in a separate vector (e.g., `preds_retained`). In each step, search through `preds_candidate`, select the best predictor, add it to `preds_retained` and remove it from `preds_candidate`.

   - At each step, record the metrics and store them in a data frame for later plots. As in the first "warm-up" exercise, you may record metrics at each step as a single-row data frame and sequentially stack (bind) them together.

   - (As above) A clever way how to construct formulas dynamically is described, for example in [this stackoverflow post](https://stackoverflow.com/questions/4951442/formula-with-dynamic-number-of-variables).
   
   - The metrics for the $k$ models are assessed *after* the order of added variables is determined. To be able to determine the metrics, the $k$ models can be saved by constructing a list of models and sequentially add elements to that list (`mylist[[ name_new_element ]] <- new_element`). You can also fit the model again after determining which predictor worked best.
     
   - Your code will most certainly have bugs at first. To debug efficiently, write code first in a simple R script and use the debugging options in RStudio (see [here](https://support.rstudio.com/hc/en-us/articles/205612627-Debugging-with-RStudio)).
  
   
+ Value retrieving

  - To get AIC and BIC values for a given model, use the base-R functions `AIC()` and `BIC()`. 
  
  - To get the cross-validated $R^2$ and RMSE, use the caret function `train()` with RMSE as the loss function, and `method = "lm"` (to fit a linear regression model). Then extract the values by `trained_model$results$Rsquared` and `trained_model$results$RMSE`.
  
+ Displaying: 

  - To display a table nicely as part of the RMarkdown html output, use the function `knitr::kable()`
  
  - To avoid reordering of the list of variable names in plotting, change the type of variable names from "character" to "factor" by `pred <- factor(pred, levels = pred)`


<!-- - remove element by index number from a vector `vec[-2]` -->
<!-- - concatenate string vectors with certain separations:  `paste(c("predictor_1","predictor_2","predictor_3"),collapse = '+')` gives `"predictor_1 + predictor_2 + predictor_3"` -->
<!-- - fit a linear regression with three predictors: `fit_lin <- lm(target ~ predictor_1 + predictor_2 + predictor_3, data = df)` -->
<!-- - execute a string-like expression: `eval(parse(text = <string>))`, for example: `eval(parse(text='mean(c(2,3,4))'))=3` -->
<!-- - get AIC and BIC from a linear regression model object: `AIC(fit_lin)`, `BIC(fit_lin)` -->
    
```{r message=FALSE}
library(caret)  # need train() for cross-validation

## specify target variable (as above)
target <- 'GPP_NT_VUT_REF'

## determine predictors as all except site ID, timestamp, and the target (should be 14)  (as above)
preds <- df %>% 
  dplyr::select(-siteid, -TIMESTAMP, -GPP_NT_VUT_REF) %>% 
  names()

# This is the vector of candidate predictors to be added in the model. To begin with, consider all as candidates.
preds_candidate <- preds 

# predictors retained in the model from the previous step. To begin with, is empty.
preds_retained <- c()

## work with lists as much as possible (more flexible!)
df_metrics <- data.frame()

## outer loop for k predictors
for (k_index in 1:length(preds)){
  
  # rsq_candidates <- c()
  df_rsq_candidates <- data.frame()
  linmod_candidates <- list()
  
  ## inner loop for single additional predictor
  for (ipred in preds_candidate){
    
    # variable vector (new variable + retained variables) used in regression
    pred_add <- c(preds_retained, ipred)
    
    # define formulate with newly-added predictor
    forml  <- as.formula(paste( target, '~', paste(pred_add, collapse = '+')))
    
    # fit linear model
    fit_lin <- lm(forml, data = df)
    
    # add model object to list, and name the element according to the added variable
    linmod_candidates[[ ipred ]] <- fit_lin
    
    # record metrics for all candidates
    rsq <- summary(fit_lin)[["r.squared"]]
    df_rsq_candidates <- bind_rows(df_rsq_candidates, data.frame(pred = ipred, rsq = rsq))  # when storing R2 in a data frame
    # rsq_candidates <- c(rsq_candidates,  rsq)  # when storing R2 as a vector
    
  }
  
  ## get name of candidate predictor that achieved the highest R2.
  pred_add <- df_rsq_candidates %>%  # when storing R2 in a data frame
    arrange(desc(rsq)) %>% 
    slice(1) %>% 
    pull(pred) %>% 
    as.character()
  
  # pred_add <- preds_candidate[ which.max(rsq_candidates) ]   # when storing R2 as a vector
  
  ## add best predictors to retained predictors 
  preds_retained <- c(preds_retained, pred_add)
  
  ## get the cross-validation R2
  forml  <- as.formula(paste( target, '~', paste(preds_retained, collapse = '+')))
  control <- trainControl(method = "cv", number = 5)
  set.seed(123)
  # cv_modl <- train(forml, data = df, method = "lm", , metric = "Rsquared", trControl = control)
  cv_modl <- train(forml, data = df, method = "lm", metric = "RMSE", trControl = control)

  # record AIC and BIC and adjusted-R2 of the respective model
  df_metrics <- df_metrics %>% 
    bind_rows(
      data.frame( pred = pred_add,
                  rsq = summary(linmod_candidates[[ pred_add ]])[["r.squared"]],
                  adj_rsq = summary(linmod_candidates[[ pred_add ]])[["adj.r.squared"]],
                  cv_rsq = cv_modl$results$Rsquared,
                  cv_rmse = cv_modl$results$RMSE,
                  aic = AIC(linmod_candidates[[ pred_add ]]),
                  bic = BIC(linmod_candidates[[ pred_add ]])
      )
    )
  

  # remove the selected variable from the candidate variable list
  preds_candidate <- preds_candidate[-which(preds_candidate == pred_add)]
  # preds_candidate <- setdiff(preds_candidate,pred_add)  # alternative
  
}

data.frame(df_metrics$pred) # order in which variables enter the model

df_metrics %>% knitr::kable()
```

- Visualise all metrics as a function of the number of predictors (add labels for the variable names of the added predictor). Highlight the best-performing model based on the respective metric. How many predictors are in the best performing model, when assessed based on each metric?

```{r eval = FALSE}
library(ggplot2)

df_metrics$pred <-  factor(df_metrics$pred, levels = df_metrics$pred)

ggplot() +
  geom_point(data = df_metrics, aes(x = pred, y = rsq)) +
  geom_point(data = filter(df_metrics, rsq == max(rsq)), aes(x = pred, y = rsq), color = "red") +
  labs(title = expression(italic(R)^2)) + 
  coord_flip()

ggplot() +
  geom_point(data = df_metrics, aes(x = pred, y = adj_rsq)) +
  geom_point(data = filter(df_metrics, adj_rsq == max(adj_rsq)), aes(x = pred, y = adj_rsq), color = "red") +
  labs(title = expression(paste("Adjusted-", italic(R)^2))) + 
  coord_flip()

ggplot() +
  geom_point(data = df_metrics, aes(x = pred, y = cv_rsq)) +
  geom_point(data = filter(df_metrics, cv_rsq == max(cv_rsq)), aes(x = pred, y = cv_rsq), color = "red") +
  labs(title = expression(paste("Cross-validated ", italic(R)^2))) + 
  coord_flip()

ggplot() +
  geom_point(data = df_metrics, aes(x = pred, y = cv_rmse)) +
  geom_point(data = filter(df_metrics, cv_rmse == min(cv_rmse)), aes(x = pred, y = cv_rmse), color = "red") +
  labs(title = expression(paste("Cross-validated ", RMSE))) + 
  coord_flip()

ggplot() +
  geom_point(data = df_metrics, aes(x = pred, y = aic)) +
  geom_point(data = filter(df_metrics, aic == min(aic)), aes(x = pred, y = aic), color = "red") +
  labs(title = "AIC")+ 
  coord_flip()

ggplot() +
  geom_point(data = df_metrics, aes(x = pred, y = bic)) +
  geom_point(data = filter(df_metrics, bic == min(bic)), aes(x = pred, y = bic), color = "red") +
  labs(title = "BIC")+ 
  coord_flip()
```

- Observe which predictors are selected into the final model and which not. Why?

*Solution:* 
Take the best model selected by BIC as an example, `VPD_F` is selected into the model and `VPD_F_MDS` is not. They won’t enter the model both because they are highly correlated and when one is in, the other doesn’t add information, which proves our expectation in the previous question.


- Whether it matters to the order of variables entering into the model which metric (i.e. AIC, BIC or $R^2$) is used? Why?

*Solution:* 
No, it doesn't matter. For each inner loop, when deciding which variable is sequentially chosen, we compare models of the same level (i.e. same number of variables). In this sense, model complexities for all the models are the same, and AIC/BIC only compares the goodness of fit part, which works the same as $R^2$. 


- Discuss your results. Which metric do you trust most? Why? Try out your algorithm without evaluating the cross-validated $R^2$. Do you get a computational efficiency gain? Which metric comes closest to the cross-validated $R^2$ in terms of selecting the same number predictors?

*Solution:*
Cross-validated $R^2$ is the "gold-standard" for assessing generalisability. However, it is computationally costly and may not provide robust results when the data set is small. Since AIC and BIC both penalise additional predictors, they both don't select the full model with 14 variables as the best model (However, it could happen that the best model selected by AIC/BIC is the full model). The BIC selects for the same number of predictors as the cross-validated $R^2$ and RMSE, but without the disadvantage of the computational costs. It provides a conservative and apparently robust estimate for the model generalisability. Note that the order of added predictors is determined by the simple $R^2$ and is therefore independent of the final metrics we use for choosing the best $k$.


### Bonus: Stepwise regression out-of-the-box
   
In R, you can also conduct the above variable selection procedures automatically. `regsubsets()` from leaps package provides a convenient way to do so. It does model selection by exhaustive search, including forward or backward stepwise regression. 

- Do a stepwise forward regression using `regsubsets()`.
- Create a data frame with the same metrics as above (except cross-validated $R^2$).
- Visualise metrics as above.
- Are your result consistent?
    
_Hints_: 

- Specify stepwise *forward* by setting `method = "forward"`.
- Specify the number of predictors to examine, that is all fourteen, by setting `nvmax = 14`.
- AIC values of each step are stored in `summary(regfit.fwd)$cp` and BIC values of each step are in `summary(regfit.fwd)$bic`, etc.
- Get order in which predictors $k$ are added (which corresponds to values returned by `summary(regfit.fwd)$bic`), by `all_predictors[regfit.fwd$vorder[2:15]-1]`. `vorder` is the order of variables entering into model. Note that `Intercept` is counted here as the first to enter into the model and should be removed.
- To avoid reordering of the list of variable names in plotting, change the type of variable names from "character" to "factor" by `preds_enter <- factor(preds_enter, levels = preds_enter)`

```{r}
library(leaps)
regfit.fwd <- regsubsets(as.formula(paste(target, "~", paste(preds,collapse=" + "))), data = df, method = "forward", nvmax = 14)
reg.summary <- summary(regfit.fwd)

# get variables in their order added to the model
preds_enter <- preds[regfit.fwd$vorder[2:15]-1]

## create metrics data frame
df_metrics_auto <- data.frame(
  preds = factor(preds_enter, levels = preds_enter),
  rsq = reg.summary$rsq, 
  adj_rsq = reg.summary$adjr2, 
  aic = reg.summary$cp, 
  bic = reg.summary$bic
  )

ggplot() +
  geom_point(data = df_metrics_auto, aes(x = preds, y = rsq)) +
  geom_point(data = filter(df_metrics_auto, rsq == max(rsq)), aes(x = preds, y = rsq), color = "red") +
  labs(title = expression(italic(R)^2)) + 
  coord_flip()

ggplot() +
  geom_point(data = df_metrics_auto, aes(x = preds, y = adj_rsq)) +
  geom_point(data = filter(df_metrics_auto, adj_rsq == max(adj_rsq)), aes(x = preds, y = adj_rsq), color = "red") +
  labs(title = expression(paste("Adjusted-", italic(R)^2))) + 
  coord_flip()

ggplot() +
  geom_point(data = df_metrics_auto, aes(x = preds, y = aic)) +
  geom_point(data = filter(df_metrics_auto, aic == min(aic)), aes(x = preds, y = aic), color = "red") +
  labs(title = "AIC")+ 
  coord_flip()

ggplot() +
  geom_point(data = df_metrics_auto, aes(x = preds, y = bic)) +
  geom_point(data = filter(df_metrics_auto, bic == min(bic)), aes(x = preds, y = bic), color = "red") +
  labs(title = "BIC")+ 
  coord_flip()
```

### Bonus: Best Subset Selection - Not included in application for students (too long...)

First we generate our data.

```{r}
set.seed(41)

#data size
n = 500

#predictors
x1 = rnorm(n)
x2 = rnorm(n)
x3 = rnorm(n)

#generate output
y = 4*x1 + 3*x3 + rnorm(n)

df_data = data.frame(y = y, x1 = x1,x2 = x2, x3 = x3)
head(df_data)
```
The true underline model uses as predictors the variables x1 and x3. However in practise we do not know this information. What we really see is the df_data file. For this exercise we want to find those variables that are most informatives. Since we have 3 available predictors we can create 7 different models using all possible combinations of predictors. This type of variable selection is reffered as _Best Subset Selection_.

```{r}
formulas = c('y~x1','y~x2','y~x3','y~x1+x2','y~x1+x3','y~x2+x3','y~x1+x2+x3')
cat("All possible models \n",paste(formulas,collapse = ' , '),'\n')

```

In order to find the optimal model we will use two methods:
 
 a. BIC
 b. Cross Validated MSE

We note here that the BIC criterion uses all the available data while the MSE uses the cross validated test data.

#### BIC
```{r}
#create vector that stores the BIC results for all the models
bic_s = rep(NA,length(formulas))

#iterate over the models
for (i in 1:length(formulas)){
  #construct formula
  formula = as.formula(formulas[i])
  
  #fit linear model
  lm_fit = lm(formula,data = df_data)
  
  #calculate bic
  bic_s[i] = BIC(lm_fit)
}

```

find the optimal model according to BIC
```{r}
#find the optimal model according to BIC
cat("Optimal Model BIC: ",formulas[which.min(bic_s)])
```

#### Cross Validated MSE
```{r}
#shuffle the data
shuffled_id = sample(x = 1:n, size = n)
df_data = df_data[shuffled_id,]

#specify the number of folds
n_folds = 5

#folds assignments (in which folder each point is assigned)
folds = cut(1:n,breaks = n_folds,labels = FALSE)

#create matrix to store the MSE from each test folder for all models : dimension 7 x n_folds
mse_s = matrix(NA,nrow = length(formulas),ncol = n_folds)

#iterate over the models
for (i in 1:length(formulas)){
  #construct formula
  formula = as.formula(formulas[i])
  
  #cross validation for the i-th model
  for(j in 1:n_folds){
    #take train indixes for all folders except j
    train_ind = (folds != j)
    #take test indexes for folder j
    test_ind = (folds == j)
    
    #train data that contains all the points that do not belong in folder j
    train_data = df_data[train_ind,]
    #test data that contains all the points that belong to folder j
    test_data = df_data[test_ind,]
    
    #fit linear model using the training data
    lm_fit = lm(formula, data = train_data)
    
    # calculate mse on test data
    ## make predictions
    y_pred = predict(lm_fit,test_data)
    #calculate mse of test folder --> jth folder
    mse = mean((y_pred - test_data$y)^2)
    
    mse_s[i,j] = mse
  }
}  
```

Check the MSE error for each test folder for every model.
```{r}
mse_s
```

Calculate Cross Validated MSE for every model.
```{r}
cv_mse = rowMeans(mse_s)
cv_mse
```

Find the optimal model according to Cross Validated MSE.
```{r}
#find the optimal model according to CV 
cat("Optimal Model CV_MSE: ",formulas[which.min(cv_mse)])
```