Skip to content

Neural Networks

vincentherrmann edited this page Jun 20, 2016 · 6 revisions

feedforward neural network

The architecture of a very simple feedforward neural net can be seen in this graphic:
Feedforward Net

input layer

Each layer of a neural net implements the ParametricTensorFunction protocol. The NeuralNetLayer base class can be used as an input layer. It has no parameters, the input is directly used as preactivations (denoted with the letter "a" in the graphic).
These preactivations are fed into the activationFunction ("σ" in the graphic), a property of the layer. Currently available activation function are the sigmoid function, (leaky) rectified linear units, and the softplus function. The output of the activation function ("h") is stored as currentActivation, it will be needed when calculating the gradient. The activations are the output of this layer. For this layer, the gradient with respect to the input is calculated by multiplying the gradient of the activations with respect to the preactivations and the gradient with respect to the output (i.e. the activations). This last gradient in turn must be provided by the following layer.

feedforward layer

FeedforwardLayer is a subclass of the NeuralNetLayer. It has two parameters: a matrix of weights ("W" in the graphic) and a bias vector ("b"). Here the preactivations are calculated by multiplying the input with the weights and adding the bias.
This additional step must be taken into account when calculating the gradient with respect to the input: The gradient with respect to the preactivations of this layer has to be multiplied with the weights. Also, the gradient with respect to both parameters must be calculated. The gradient with respect to the bias is simply the preactivation gradient. The weight gradient is calculated by multiplying the reactivation gradient with the input of this layer, that is the stored activations of the previous layer.

neural net

All layers of a neural net are encapsulated in the NeuralNet class, representing the whole neural network, and also following the ParametricTensorFunction protocol. The different layers are stored in an array called layers. The parameters of the individual layers are synthesized into a flat array and taken as the various parameters of the NeuralNet.
Calculating the output of the last layer from the input of the first layer can simply be done in one line using reduce():

let output = layers.reduce(input, combine: {$1.output($0)})

Backpropagation of the gradients can be done very similarly on the reversed layers array. Additionally, the gradients with respect to the parameters have to be created.

var gradientWrtParameters: [Tensor<Float>] = []
let gradientWrtInput = layers.reverse().reduce(gradientWrtOutput) { (currentGradientWrtOutput, currentLayer) -> Tensor<Float> in
    let gradients = currentLayer.gradients(currentGradientWrtOutput)
    gradientWrtParameters.insertContentsOf(gradients.wrtParameters, at: 0)
    return gradients.wrtInput
}

NeuralNet can be initialized by giving it an array of layer sizes. The weights of the feedforward layers are then automatically initialized with small random values.

cost function

To train a neural net, we must set up a cost function, e.g. SquaredErrorCost. The neural net has to be the estimator of this cost function. For the parameters of the neural net, optional regularizers of type ParameterRegularizer can be set. For the weight parameters, the ParameterDecay regularizer might be a good idea. The cost function, and with it the neural net estimator, can be trained with stochastic gradient descent.

A complete setup of a neural net, trained for recognizing MNIST digits can look like this:

var estimator =  NeuralNet(layerSizes: [28*28, 40, 10])
//change the activation functions of the first two layers to rectified linear units
estimator.layers[0].activationFunction = ReLU(secondarySlope: 0.01) 
estimator.layers[1].activationFunction = ReLU(secondarySlope: 0.01)

var neuralNetCost = SquaredErrorCost(forEstimator: estimator)
//add parameter decay for the weights of layers 1 and 2
let regularizer = ParameterDecay(decayRate: 0.0001)
neuralNetCost.regularizers[0] = regularizer
neuralNetCost.regularizers[2] = regularizer

//train the neural net for 30 epochs
stochasticGradientDescent(neuralNetCost, inputs: trainingData[.a, .b], targets: trainingLabels[.a, .c], updateRate: 0.1, minibatchSize: 50, validationCallback: ({ (epoch, estimator) -> (Bool) in
    if(epoch >= 30) {return true} 
    else {return false}
}))