Skip to content

Parametric Functions and Optimization

vincentherrmann edited this page Jun 20, 2016 · 5 revisions

parametric tensor function

ParametricTensorFunction is a protocol defining a differentiable parametric function, that takes a tensor as input, produces a tensor as output and has an array of tensors as parameters. An object following the ParametricTensorFunction protocol must implement, besides the output function, a method calculating the gradient with respect to the input and to each parameter given the gradient with respect to the input vector. Additionally, an update method for the parameters must be implemented. A ParametricTensorFunction itself can be a wrapper of a computational graph of parametric tensor functions. For example, a whole neural net is a parametric tensor function, as well as each individual layer of this net.

cost function

The cost function evaluates the output tensor of an estimator by comparing it to a target tensor. The estimator can be any parametric tensor function. In the costForEstimate() method, the evaluation of an estimated tensor results in a single, non-negative scalar, the cost of the given estimate. Good estimates should result in small costs. gradientForEstimate() calculates the gradient of the cost with respect to the estimate. This gradient can be fed into the gradients() method of the estimator as starting point of gradient backpropagation.

The cost function automatically provides a dedicated update method for the parameters of the estimator. This update method factors in a regularizer for each parameter. These regularizers (like for instance weight decay) can be independently set for each parameter and lie additional constraints to them during optimization.

Furthermore, a method for computing the gradient with respect to each parameter of the estimator is automatically implemented: numericalGradients() calculates the gradients by perturbing slightly every value of every parameter separately. This of course is very computationally expensive and should only be done for checking the correctness of gradients calculated with backpropagation.

stochastic gradient descent

The estimator of a cost function can be trained by using stochasticGradientDescent(). The output of of the given cost function, called objective in this context, is reduced by tweaking the parameters of the estimator. This algorithm needs a whole training set of samples as input, and their corresponding targets. For a minibatch of samples, the cost gradients with respect to the parameters is calculated, and the parameters are updated by subtracting a certain percentage of these gradients from them. Minibatches are slices of the training data. Once all training samples have been used, the closure validationCallback() is called. By returning true, it can terminate the algorithm. If it returns false, the training samples are shuffled the optimization continues. One cycle through all training samples with the minibatches is called epoch.