en_Model Evaluation in Regression Models.srt

0
00:00:00,630 --> 00:00:02,560
Hello, and welcome!

1
00:00:02,560 --> 00:00:08,040
In this video, we’ll be covering model evaluation.  So, let’s get started.

2
00:00:08,039 --> 00:00:13,840
The goal of regression is to build a model to accurately predict an unknown case.

3
00:00:13,840 --> 00:00:19,749
To this end, we have to perform regression evaluation after building the model.

4
00:00:19,749 --> 00:00:24,909
In this video, we’ll introduce and discuss two types of evaluation approaches that can

5
00:00:24,909 --> 00:00:27,619
be used to achieve this goal.

6
00:00:27,619 --> 00:00:34,610
These approaches are: train and test on the same dataset, and train/test split.

7
00:00:34,610 --> 00:00:40,340
We’ll talk about what each of these are, as well as the pros and cons of using each

8
00:00:40,340 --> 00:00:41,860
of these models.

9
00:00:41,860 --> 00:00:46,850
Also, we’ll introduce some metrics for accuracy of regression models.

10
00:00:46,850 --> 00:00:50,230
Let’s look at the first approach.

11
00:00:50,230 --> 00:00:54,900
When considering evaluation models, we clearly want to choose the one that will give us the

12
00:00:54,900 --> 00:00:56,880
most accurate results.

13
00:00:56,880 --> 00:01:02,010
So, the question is, how we can calculate the accuracy of our model?

14
00:01:02,010 --> 00:01:07,450
In other words, how much can we trust this model for prediction of an unknown sample,

15
00:01:07,450 --> 00:01:13,530
using a given a dataset and having built a model such as linear regression.

16
00:01:13,530 --> 00:01:18,549
One of the solutions is to select a portion of our dataset for testing.

17
00:01:18,549 --> 00:01:23,179
For instance, assume that we have 10 records in our dataset.

18
00:01:23,179 --> 00:01:29,100
We use the entire dataset for training, and we build a model using this training set.

19
00:01:29,100 --> 00:01:36,759
Now, we select a small portion of the dataset, such as row numbers 6 to 9, but without the

20
00:01:36,759 --> 00:01:38,229
labels.

21
00:01:38,229 --> 00:01:43,139
This set, is called a test set, which has the labels, but the labels are not used for

22
00:01:43,139 --> 00:01:46,180
prediction, and is used only as ground truth.

23
00:01:46,180 --> 00:01:50,679
The labels are called “Actual values” of the test set.

24
00:01:50,679 --> 00:01:56,280
Now, we pass the feature set of the testing portion to our built model, and predict the

25
00:01:56,280 --> 00:01:58,259
target values.

26
00:01:58,259 --> 00:02:04,640
Finally, we compare the predicted values by our model with the actual values in the test

27
00:02:04,640 --> 00:02:06,270
set.

28
00:02:06,270 --> 00:02:09,830
This indicates how accurate our model actually is.

29
00:02:09,830 --> 00:02:15,140
There are different metrics to report the accuracy of the model, but most of them work

30
00:02:15,140 --> 00:02:19,760
generally, based on the similarity of the predicted and actual values.

31
00:02:19,760 --> 00:02:25,500
Let’s look at one of the simplest metrics to calculate the accuracy of our regression

32
00:02:25,500 --> 00:02:26,500
model.

33
00:02:26,500 --> 00:02:32,980
As mentioned, we just compare the actual values, y, with the predicted values, which is noted

34
00:02:32,980 --> 00:02:35,610
as y ̂ for the testing set.

35
00:02:35,610 --> 00:02:40,860
The error of the model is calculated as the average difference between the predicted and

36
00:02:40,860 --> 00:02:43,830
actual values for all the rows.

37
00:02:43,830 --> 00:02:46,760
We can write this error as an equation.

38
00:02:46,760 --> 00:02:52,420
So, the first evaluation approach we just talked about is the simplest one: train and

39
00:02:52,420 --> 00:02:55,010
test on the SAME dataset.

40
00:02:55,010 --> 00:03:00,330
Essentially, the name of this approach says it all … you train the model on the entire

41
00:03:00,330 --> 00:03:05,680
dataset, then you test it using a portion of the same dataset.

42
00:03:05,680 --> 00:03:10,470
In a general sense, when you test with a dataset in which you know the target value for each

43
00:03:10,470 --> 00:03:16,580
data point, you’re able to obtain a percentage of accurate predictions for the model.

44
00:03:16,580 --> 00:03:21,340
This evaluation approach would most likely have a high “training accuracy” and a

45
00:03:21,340 --> 00:03:26,890
low “out-of-sample accuracy”, since the model knows all of the testing data points

46
00:03:26,890 --> 00:03:29,050
from the training.

47
00:03:29,050 --> 00:03:33,910
What is training accuracy and out-of-sample accuracy?

48
00:03:33,910 --> 00:03:39,450
We said that training and testing on the same dataset produces a high training accuracy,

49
00:03:39,450 --> 00:03:43,560
but what exactly is &quot;training accuracy?&quot;

50
00:03:43,560 --> 00:03:48,860
Training accuracy is the percentage of correct predictions that the model makes when using

51
00:03:48,860 --> 00:03:50,580
the test dataset.

52
00:03:50,580 --> 00:03:56,580
However, a high training accuracy isn’t necessarily a good thing.

53
00:03:56,580 --> 00:04:03,220
For instance, having a high training accuracy may result in an ‘over-fit’ of the data.

54
00:04:03,220 --> 00:04:08,140
This means that the model is overly trained to the dataset, which may capture noise and

55
00:04:08,140 --> 00:04:11,970
produce a non-generalized model.

56
00:04:11,970 --> 00:04:16,630
Out-of-sample accuracy is the percentage of correct predictions that the model makes on

57
00:04:16,630 --> 00:04:20,380
data that the model has NOT been trained on.

58
00:04:20,380 --> 00:04:27,090
Doing a “train and test” on the same dataset will most likely have low out-of-sample accuracy

59
00:04:27,090 --> 00:04:28,720
due to the likelihood of being over-fit.

60
00:04:28,720 --> 00:04:34,790
It’s important that our models have high, out-of-sample accuracy, because the purpose

61
00:04:34,790 --> 00:04:40,430
of our model is, of course, to make correct predictions on unknown data.

62
00:04:40,430 --> 00:04:44,670
So, how can we improve out-of-sample accuracy?

63
00:04:44,670 --> 00:04:50,490
One way is to use another evaluation approach called &quot;Train/Test Split.&quot;

64
00:04:50,490 --> 00:04:52,590
In this approach, we select a portion of our

65
00:04:52,590 --> 00:04:57,150
dataset for training, for example, rows 0 to 5.

66
00:04:57,150 --> 00:05:01,360
And the rest is used for testing, for example, rows 6 to 9.

67
00:05:01,360 --> 00:05:04,030
The model is built on the training set.

68
00:05:04,030 --> 00:05:08,660
Then, the test feature set is passed to the model for prediction.

69
00:05:08,660 --> 00:05:14,040
And finally, the predicted values for the test set are compared with the actual values

70
00:05:14,040 --> 00:05:16,210
of the testing set.

71
00:05:16,210 --> 00:05:21,000
This second evaluation approach, is called &quot;Train/Test Split.&quot;

72
00:05:21,000 --> 00:05:27,440
Train/Test Split involves splitting the dataset into training and testing sets, respectively,

73
00:05:27,440 --> 00:05:32,980
which are mutually exclusive, after which, you train with the training set and test with

74
00:05:32,980 --> 00:05:34,620
the testing set.

75
00:05:34,620 --> 00:05:40,080
This will provide a more accurate evaluation on out-of-sample accuracy because the testing

76
00:05:40,080 --> 00:05:44,370
dataset is NOT part of the dataset that has been used to train the data.

77
00:05:44,370 --> 00:05:48,630
It is more realistic for real world problems.

78
00:05:48,630 --> 00:05:53,460
This means that we know the outcome of each data point in this dataset, making it great

79
00:05:53,460 --> 00:05:55,120
to test with!

80
00:05:55,120 --> 00:05:59,980
And since this data has not been used to train the model, the model has no knowledge of the

81
00:05:59,980 --> 00:06:01,980
outcome of these data points.

82
00:06:01,980 --> 00:06:05,940
So, in essence, it’s truly out-of-sample testing.

83
00:06:05,940 --> 00:06:11,431
However, please ensure that you train your model with the testing set afterwards, as

84
00:06:11,431 --> 00:06:15,290
you don’t want to lose potentially valuable data.

85
00:06:15,290 --> 00:06:20,480
The issue with train/test split is that it’s highly dependent on the datasets on which

86
00:06:20,480 --> 00:06:23,430
the data was trained and tested.

87
00:06:23,430 --> 00:06:29,220
The variation of this causes train/test split to have a better out-of-sample prediction

88
00:06:29,220 --> 00:06:35,050
than training and testing on the same dataset, but it still has some problems due to this

89
00:06:35,050 --> 00:06:36,919
dependency.

90
00:06:36,919 --> 00:06:44,050
Another evaluation model, called &quot;K-Fold Cross-validation,&quot; resolves most of these issues.

91
00:06:44,050 --> 00:06:47,700
How do you fix a high variation that results from a dependency?

92
00:06:47,700 --> 00:06:49,940
Well, you average it.

93
00:06:49,940 --> 00:06:55,660
Let me explain the basic concept of “k-fold cross-validation” to see how we can solve

94
00:06:55,660 --> 00:06:57,390
this problem.

95
00:06:57,390 --> 00:07:02,680
The entire dataset is represented by the points in the image at the top left.

96
00:07:02,680 --> 00:07:09,080
If we have k=4 folds, then we split up this dataset as shown here.

97
00:07:09,080 --> 00:07:14,500
In the first fold, for example, we use the first 25 percent of the dataset for testing,

98
00:07:14,500 --> 00:07:16,800
and the rest for training.

99
00:07:16,800 --> 00:07:22,090
The model is built using the training set, and is evaluated using the test set.

100
00:07:22,090 --> 00:07:28,680
Then, in the next round (or in the second fold), the second 25 percent of the dataset

101
00:07:28,680 --> 00:07:32,699
is used for testing and the rest for training the model.

102
00:07:32,699 --> 00:07:36,860
Again the accuracy of the model is calculated.

103
00:07:36,860 --> 00:07:39,510
We continue for all folds.

104
00:07:39,510 --> 00:07:43,980
Finally the result of all 4 evaluations are averaged.

105
00:07:43,980 --> 00:07:49,680
That is, the accuracy of each fold is then averaged, keeping in mind that each fold is

106
00:07:49,680 --> 00:07:55,430
distinct, where no training data in one fold is used in another.

107
00:07:55,430 --> 00:08:01,759
K-fold cross-validation, in its simplest form, performs multiple train/test splits using

108
00:08:01,759 --> 00:08:05,540
the same dataset where each split is different.

109
00:08:05,540 --> 00:08:11,830
Then, the result is averaged to produce a more consistent out-of-sample accuracy.

110
00:08:11,830 --> 00:08:16,170
We wanted to show you an evaluation model that addressed some of the issues we’ve

111
00:08:16,170 --> 00:08:18,440
described in the previous approaches.

112
00:08:18,440 --> 00:08:25,710
However, going in-depth with the K-fold cross-validation model is out of the scope for this course.

113
00:08:25,710 --> 00:08:26,430
Thanks for watching!