Skip to content

Hyperparameter Tuning

Mads Dabros edited this page Dec 2, 2018 · 15 revisions

Introduction

Hyperparameters affect the learning process of a machine learning algorithm. These parameters can be used to adjust how complex the model is allowed to be. Incorrectly adjusted hyperparameters can lead to:

  • Overfitting - The model is too complex, this is also known as high variance.
  • Underfitting - The model is too simple, this is also known as high bias.

Manually tuning the hyperparameters of a machine learning algorithm can be a very time consuming task, especially when the number of hyperparameters starts to get large. Therefore it is preferable to use an optimizer and let the computer do the work.

In SharpLearning, there are several different optimizers in SharpLearning.Optimization:

  • GridSearch
  • RandomSearch
  • ParticleSwarm
  • GlobalizedBoundedNelderMead
  • BayesianOptimization

In this guide we are going to use the RandomSearchOptimizer to tune the hyperparameters of a GradientBoostLearner. We are going to use the wine quality data set also used in the Introduction to SharpLearning. The full code examples from this guide can be found in SharpLearning.Examples

Using the RandomSearchOptimizer to tune Hyperparameters

We are going to tune the hyperparamters of a RegressionSquareLossGradientBoostLearner for scoring the quality of white wine. Before starting the search for better hyperparamters, lets get a baseline of how the model performs with default parameters. As always, we start by splitting the data set into a training and test set, so we can use the test set as an estimate on how well the model generalizes to new data.

// create learner with default parameters
var learner = new RegressionSquareLossGradientBoostLearner(runParallel: false);

// learn model with found parameters
var model = learner.Learn(trainSet.Observations, trainSet.Targets);

// predict the training and test set.
var trainPredictions = model.Predict(trainSet.Observations);
var testPredictions = model.Predict(testSet.Observations);

// since this is a regression problem we are using square error as metric
// for evaluating how well the model performs.
var metric = new MeanSquaredErrorRegressionMetric();

// measure the error on the test set.
var testError = metric.Error(testSet.Targets, testPredictions);
Algorithm Test Error
RegressionSquareLossGradientBoostLearner (default) 0.4984

Now that we have our baseline, lets setup the RandomSearchOptimizer to tune the hyperparameters for a better results.

The optimizer needs to know the parameter specification of the hyperparameters we are going to tune. The RegressionSquareLossGradientBoostLearner has several hyperparamters:

// Parameter specs for the optimizer
var parameters = new IParameterSpec[]
{
    new MinMaxParameterSpec(min: 80, max: 300, 
        transform: Transform.Linear, parameterType: ParameterType.Discrete), // iterations

    new MinMaxParameterSpec(min: 0.02, max:  0.2, 
        transform: Transform.Logarithmic, parameterType: ParameterType.Continuous), // learning rate

    new MinMaxParameterSpec(min: 8, max: 15, 
        transform: Transform.Linear, parameterType: ParameterType.Discrete), // maximumTreeDepth

    new MinMaxParameterSpec(min: 0.5, max: 0.9, 
        transform: Transform.Linear, parameterType: ParameterType.Continuous), // subSampleRatio

    new MinMaxParameterSpec(min: 1, max: numberOfFeatures, 
        transform: Transform.Linear, parameterType: ParameterType.Discrete), // featuresPrSplit
};

Notice that the learning rate uses a Logarithmic transform for sampling new candidate values. This is to ensure a more even distribution across the entire range of values, since there is a large difference between the minimum 0.02 and maximum 0.2 when looking at the values on a numerical scale.

Each MinMaxParameterSpec is also assigned a ParameterType, which tells the sampler used by the optimizer, that the parameter should be sampled from a discrete or continous range.

For evalauting each set of hyperparameters, using the optimizer, we are going to further split the training data into a validation set and leave our current test set out of the optimization. If we optimize directly on the error of our test set, we risk getting a positive bias on our final error estimate:

// Further split the training data to have a validation set to measure
// how well the model generalizes to unseen data during the optimization.
var validationSplit = new RandomTrainingTestIndexSplitter<double>(trainingPercentage: 0.7, seed: 24)
    .SplitSet(trainSet.Observations, trainSet.Targets);

The optimizer also needs an objective function, where we learn a candidate RegressionGradientBoostModel using the current set of hyperparameters, and evaluates the performance using the validation set. We are using the mean square error as metric. The objective function takes as input a double[] containing the candidate set of hyperparameters, and returns an OptimizerResult containing the validation error and the corresponding set of hyperparameters:

 // Define optimizer objective (function to minimize)
 Func<double[], OptimizerResult> minimize = p =>
 {
   // create the candidate learner using the current optimization parameters.
   var candidateLearner = new RegressionSquareLossGradientBoostLearner(
                        iterations: (int)p[0],
                        learningRate: p[1], 
                        maximumTreeDepth: (int)p[2], 
                        subSampleRatio: p[3], 
                        featuresPrSplit: (int)p[4],
                        runParallel: false);

   var candidateModel = candidateLearner.Learn(validationSplit.TrainingSet.Observations,
   validationSplit.TrainingSet.Targets);

   var validationPredictions = candidateModel.Predict(validationSplit.TestSet.Observations);
   var candidateError = metric.Error(validationSplit.TestSet.Targets, validationPredictions);

   return new OptimizerResult(p, candidateError);
};

When the objective function is defined, we can create and run the optimizer to find the best set of parameters. We are going to let the RandomSearhOptimizer run for 30 iterations and try out 30 sets of hyperparameters. The hyperparameters are sampled randomly within the bounds we defined earlier:

// create optimizer
var optimizer = new RandomSearchOptimizer(parameters, iterations: 30, runParallel: true);

// find best hyperparameters
var result = optimizer.OptimizeBest(minimize);
var best = result.ParameterSet;

Running the optimizer finds the following hyperparameter set:

  • Trees: 277
  • learningRate: 0.035
  • maximumTreeDepth: 15
  • subSampleRatio: 0.838
  • featuresPrSplit: 4

After the optimizer has found the best set of hyperparameters, measured on the validation set, we can create a learner using these parameters and learn a new RegressionGradientBoostModel on the the full training set:

// create the final learner using the best hyperparameters.
var learner = new RegressionSquareLossGradientBoostLearner(
                iterations: (int)best[0],
                learningRate: best[1], 
                maximumTreeDepth: (int)best[2], 
                subSampleRatio: best[3],
                featuresPrSplit: (int)best[4], 
                runParallel: false);

// learn model with found parameters
var model = learner.Learn(trainSet.Observations, trainSet.Targets);

Using the found hyperparameters, the RegressionSquareLossGradientBoostLearner is able to reduce the test error significantly, from 0.4984 to 0.3852:

Algorithm Test Error
RegressionSquareLossGradientBoostLearner (default) 0.4984
RegressionSquareLossGradientBoostLearner (Optimizer) 0.3852