Skip to content

Part 10: Model Selection

Let edited this page Aug 25, 2021 · 6 revisions

It is the process of selecting one final ML model from among a collection of candidate machine learning models for a training dataset which addresses the original problem statement (e.g. business or problem which the model aims to solve).

Model selection evaluates or assess candidate models in order to choose the best one, and Model assessment happens after where it evaluates how well it is expected to perform in general.

See related notebook 10/k-fold_and_grid-search.ipynb.

Note that k-fold is different from its extension, the Stratified k-fold,

(Source: Z² Little, 2020)

Instead of a single fold as the test dataset, it takes samples from each fold to set as parts of the test data. It only and always shuffles data 1 time before splitting.

Or the Stratified shuffle split

(Source: Z² Little, 2020)

In shuffle split, the data is shuffled every time and then split, meaning that the test sets may overlap between the splits.

See related notebook for simple application 10/aic_and_bic.ipynb.

See related notebook 10/xg_boost-classifier.ipynb (for classifier) and 10/xg_boost-regressor.ipynb (for regressor).