-
Notifications
You must be signed in to change notification settings - Fork 0
/
en_Model Evaluation in Regression Models.srt
456 lines (342 loc) · 10.8 KB
/
en_Model Evaluation in Regression Models.srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
0
00:00:00,630 --> 00:00:02,560
Hello, and welcome!
1
00:00:02,560 --> 00:00:08,040
In this video, we’ll be covering model evaluation. So, let’s get started.
2
00:00:08,039 --> 00:00:13,840
The goal of regression is to build a model to accurately predict an unknown case.
3
00:00:13,840 --> 00:00:19,749
To this end, we have to perform regression evaluation after building the model.
4
00:00:19,749 --> 00:00:24,909
In this video, we’ll introduce and discuss two types of evaluation approaches that can
5
00:00:24,909 --> 00:00:27,619
be used to achieve this goal.
6
00:00:27,619 --> 00:00:34,610
These approaches are: train and test on the same dataset, and train/test split.
7
00:00:34,610 --> 00:00:40,340
We’ll talk about what each of these are, as well as the pros and cons of using each
8
00:00:40,340 --> 00:00:41,860
of these models.
9
00:00:41,860 --> 00:00:46,850
Also, we’ll introduce some metrics for accuracy of regression models.
10
00:00:46,850 --> 00:00:50,230
Let’s look at the first approach.
11
00:00:50,230 --> 00:00:54,900
When considering evaluation models, we clearly want to choose the one that will give us the
12
00:00:54,900 --> 00:00:56,880
most accurate results.
13
00:00:56,880 --> 00:01:02,010
So, the question is, how we can calculate the accuracy of our model?
14
00:01:02,010 --> 00:01:07,450
In other words, how much can we trust this model for prediction of an unknown sample,
15
00:01:07,450 --> 00:01:13,530
using a given a dataset and having built a model such as linear regression.
16
00:01:13,530 --> 00:01:18,549
One of the solutions is to select a portion of our dataset for testing.
17
00:01:18,549 --> 00:01:23,179
For instance, assume that we have 10 records in our dataset.
18
00:01:23,179 --> 00:01:29,100
We use the entire dataset for training, and we build a model using this training set.
19
00:01:29,100 --> 00:01:36,759
Now, we select a small portion of the dataset, such as row numbers 6 to 9, but without the
20
00:01:36,759 --> 00:01:38,229
labels.
21
00:01:38,229 --> 00:01:43,139
This set, is called a test set, which has the labels, but the labels are not used for
22
00:01:43,139 --> 00:01:46,180
prediction, and is used only as ground truth.
23
00:01:46,180 --> 00:01:50,679
The labels are called “Actual values” of the test set.
24
00:01:50,679 --> 00:01:56,280
Now, we pass the feature set of the testing portion to our built model, and predict the
25
00:01:56,280 --> 00:01:58,259
target values.
26
00:01:58,259 --> 00:02:04,640
Finally, we compare the predicted values by our model with the actual values in the test
27
00:02:04,640 --> 00:02:06,270
set.
28
00:02:06,270 --> 00:02:09,830
This indicates how accurate our model actually is.
29
00:02:09,830 --> 00:02:15,140
There are different metrics to report the accuracy of the model, but most of them work
30
00:02:15,140 --> 00:02:19,760
generally, based on the similarity of the predicted and actual values.
31
00:02:19,760 --> 00:02:25,500
Let’s look at one of the simplest metrics to calculate the accuracy of our regression
32
00:02:25,500 --> 00:02:26,500
model.
33
00:02:26,500 --> 00:02:32,980
As mentioned, we just compare the actual values, y, with the predicted values, which is noted
34
00:02:32,980 --> 00:02:35,610
as y ̂ for the testing set.
35
00:02:35,610 --> 00:02:40,860
The error of the model is calculated as the average difference between the predicted and
36
00:02:40,860 --> 00:02:43,830
actual values for all the rows.
37
00:02:43,830 --> 00:02:46,760
We can write this error as an equation.
38
00:02:46,760 --> 00:02:52,420
So, the first evaluation approach we just talked about is the simplest one: train and
39
00:02:52,420 --> 00:02:55,010
test on the SAME dataset.
40
00:02:55,010 --> 00:03:00,330
Essentially, the name of this approach says it all … you train the model on the entire
41
00:03:00,330 --> 00:03:05,680
dataset, then you test it using a portion of the same dataset.
42
00:03:05,680 --> 00:03:10,470
In a general sense, when you test with a dataset in which you know the target value for each
43
00:03:10,470 --> 00:03:16,580
data point, you’re able to obtain a percentage of accurate predictions for the model.
44
00:03:16,580 --> 00:03:21,340
This evaluation approach would most likely have a high “training accuracy” and a
45
00:03:21,340 --> 00:03:26,890
low “out-of-sample accuracy”, since the model knows all of the testing data points
46
00:03:26,890 --> 00:03:29,050
from the training.
47
00:03:29,050 --> 00:03:33,910
What is training accuracy and out-of-sample accuracy?
48
00:03:33,910 --> 00:03:39,450
We said that training and testing on the same dataset produces a high training accuracy,
49
00:03:39,450 --> 00:03:43,560
but what exactly is "training accuracy?"
50
00:03:43,560 --> 00:03:48,860
Training accuracy is the percentage of correct predictions that the model makes when using
51
00:03:48,860 --> 00:03:50,580
the test dataset.
52
00:03:50,580 --> 00:03:56,580
However, a high training accuracy isn’t necessarily a good thing.
53
00:03:56,580 --> 00:04:03,220
For instance, having a high training accuracy may result in an ‘over-fit’ of the data.
54
00:04:03,220 --> 00:04:08,140
This means that the model is overly trained to the dataset, which may capture noise and
55
00:04:08,140 --> 00:04:11,970
produce a non-generalized model.
56
00:04:11,970 --> 00:04:16,630
Out-of-sample accuracy is the percentage of correct predictions that the model makes on
57
00:04:16,630 --> 00:04:20,380
data that the model has NOT been trained on.
58
00:04:20,380 --> 00:04:27,090
Doing a “train and test” on the same dataset will most likely have low out-of-sample accuracy
59
00:04:27,090 --> 00:04:28,720
due to the likelihood of being over-fit.
60
00:04:28,720 --> 00:04:34,790
It’s important that our models have high, out-of-sample accuracy, because the purpose
61
00:04:34,790 --> 00:04:40,430
of our model is, of course, to make correct predictions on unknown data.
62
00:04:40,430 --> 00:04:44,670
So, how can we improve out-of-sample accuracy?
63
00:04:44,670 --> 00:04:50,490
One way is to use another evaluation approach called "Train/Test Split."
64
00:04:50,490 --> 00:04:52,590
In this approach, we select a portion of our
65
00:04:52,590 --> 00:04:57,150
dataset for training, for example, rows 0 to 5.
66
00:04:57,150 --> 00:05:01,360
And the rest is used for testing, for example, rows 6 to 9.
67
00:05:01,360 --> 00:05:04,030
The model is built on the training set.
68
00:05:04,030 --> 00:05:08,660
Then, the test feature set is passed to the model for prediction.
69
00:05:08,660 --> 00:05:14,040
And finally, the predicted values for the test set are compared with the actual values
70
00:05:14,040 --> 00:05:16,210
of the testing set.
71
00:05:16,210 --> 00:05:21,000
This second evaluation approach, is called "Train/Test Split."
72
00:05:21,000 --> 00:05:27,440
Train/Test Split involves splitting the dataset into training and testing sets, respectively,
73
00:05:27,440 --> 00:05:32,980
which are mutually exclusive, after which, you train with the training set and test with
74
00:05:32,980 --> 00:05:34,620
the testing set.
75
00:05:34,620 --> 00:05:40,080
This will provide a more accurate evaluation on out-of-sample accuracy because the testing
76
00:05:40,080 --> 00:05:44,370
dataset is NOT part of the dataset that has been used to train the data.
77
00:05:44,370 --> 00:05:48,630
It is more realistic for real world problems.
78
00:05:48,630 --> 00:05:53,460
This means that we know the outcome of each data point in this dataset, making it great
79
00:05:53,460 --> 00:05:55,120
to test with!
80
00:05:55,120 --> 00:05:59,980
And since this data has not been used to train the model, the model has no knowledge of the
81
00:05:59,980 --> 00:06:01,980
outcome of these data points.
82
00:06:01,980 --> 00:06:05,940
So, in essence, it’s truly out-of-sample testing.
83
00:06:05,940 --> 00:06:11,431
However, please ensure that you train your model with the testing set afterwards, as
84
00:06:11,431 --> 00:06:15,290
you don’t want to lose potentially valuable data.
85
00:06:15,290 --> 00:06:20,480
The issue with train/test split is that it’s highly dependent on the datasets on which
86
00:06:20,480 --> 00:06:23,430
the data was trained and tested.
87
00:06:23,430 --> 00:06:29,220
The variation of this causes train/test split to have a better out-of-sample prediction
88
00:06:29,220 --> 00:06:35,050
than training and testing on the same dataset, but it still has some problems due to this
89
00:06:35,050 --> 00:06:36,919
dependency.
90
00:06:36,919 --> 00:06:44,050
Another evaluation model, called "K-Fold Cross-validation," resolves most of these issues.
91
00:06:44,050 --> 00:06:47,700
How do you fix a high variation that results from a dependency?
92
00:06:47,700 --> 00:06:49,940
Well, you average it.
93
00:06:49,940 --> 00:06:55,660
Let me explain the basic concept of “k-fold cross-validation” to see how we can solve
94
00:06:55,660 --> 00:06:57,390
this problem.
95
00:06:57,390 --> 00:07:02,680
The entire dataset is represented by the points in the image at the top left.
96
00:07:02,680 --> 00:07:09,080
If we have k=4 folds, then we split up this dataset as shown here.
97
00:07:09,080 --> 00:07:14,500
In the first fold, for example, we use the first 25 percent of the dataset for testing,
98
00:07:14,500 --> 00:07:16,800
and the rest for training.
99
00:07:16,800 --> 00:07:22,090
The model is built using the training set, and is evaluated using the test set.
100
00:07:22,090 --> 00:07:28,680
Then, in the next round (or in the second fold), the second 25 percent of the dataset
101
00:07:28,680 --> 00:07:32,699
is used for testing and the rest for training the model.
102
00:07:32,699 --> 00:07:36,860
Again the accuracy of the model is calculated.
103
00:07:36,860 --> 00:07:39,510
We continue for all folds.
104
00:07:39,510 --> 00:07:43,980
Finally the result of all 4 evaluations are averaged.
105
00:07:43,980 --> 00:07:49,680
That is, the accuracy of each fold is then averaged, keeping in mind that each fold is
106
00:07:49,680 --> 00:07:55,430
distinct, where no training data in one fold is used in another.
107
00:07:55,430 --> 00:08:01,759
K-fold cross-validation, in its simplest form, performs multiple train/test splits using
108
00:08:01,759 --> 00:08:05,540
the same dataset where each split is different.
109
00:08:05,540 --> 00:08:11,830
Then, the result is averaged to produce a more consistent out-of-sample accuracy.
110
00:08:11,830 --> 00:08:16,170
We wanted to show you an evaluation model that addressed some of the issues we’ve
111
00:08:16,170 --> 00:08:18,440
described in the previous approaches.
112
00:08:18,440 --> 00:08:25,710
However, going in-depth with the K-fold cross-validation model is out of the scope for this course.
113
00:08:25,710 --> 00:08:26,430
Thanks for watching!