-
Notifications
You must be signed in to change notification settings - Fork 3
Neural Networks
The architecture of a very simple feedforward neural net can be seen in this graphic:
Each layer of a neural net implements the ParametricTensorFunction
protocol. The NeuralNetLayer
base class can be used as an input layer. It has no parameters, the input is directly used as preactivations (denoted with the letter "a" in the graphic).
These preactivations are fed into the activationFunction
("σ" in the graphic), a property of the layer. Currently available activation function are the sigmoid function, (leaky) rectified linear units, and the softplus function. The output of the activation function ("h") is stored as currentActivation
, it will be needed when calculating the gradient. The activations are the output of this layer. For this layer, the gradient with respect to the input is calculated by multiplying the gradient of the activations with respect to the preactivations and the gradient with respect to the output (i.e. the activations). This last gradient in turn must be provided by the following layer.
FeedforwardLayer
is a subclass of the NeuralNetLayer
. It has two parameters: a matrix of weights ("W" in the graphic) and a bias vector ("b"). Here the preactivations are calculated by multiplying the input with the weights and adding the bias.
This additional step must be taken into account when calculating the gradient with respect to the input: The gradient with respect to the preactivations of this layer has to be multiplied with the weights. Also, the gradient with respect to both parameters must be calculated. The gradient with respect to the bias is simply the preactivation gradient. The weight gradient is calculated by multiplying the reactivation gradient with the input of this layer, that is the stored activations of the previous layer.
All layers of a neural net are encapsulated in the NeuralNet
class, representing the whole neural network, and also following the ParametricTensorFunction
protocol. The different layers are stored in an array called layers
. The parameters of the individual layers are synthesized into a flat array and taken as the various parameters of the NeuralNet
.
Calculating the output of the last layer from the input of the first layer can simply be done in one line using reduce()
:
let output = layers.reduce(input, combine: {$1.output($0)})
Backpropagation of the gradients can be done very similarly on the reversed layers
array. Additionally, the gradients with respect to the parameters have to be created.
var gradientWrtParameters: [Tensor<Float>] = []
let gradientWrtInput = layers.reverse().reduce(gradientWrtOutput) { (currentGradientWrtOutput, currentLayer) -> Tensor<Float> in
let gradients = currentLayer.gradients(currentGradientWrtOutput)
gradientWrtParameters.insertContentsOf(gradients.wrtParameters, at: 0)
return gradients.wrtInput
}
NeuralNet
can be initialized by giving it an array of layer sizes. The weights of the feedforward layers are then automatically initialized with small random values.
To train a neural net, we must set up a cost function, e.g. SquaredErrorCost
. The neural net has to be the estimator
of this cost function. For the parameters of the neural net, optional regularizers of type ParameterRegularizer
can be set. For the weight parameters, the ParameterDecay
regularizer might be a good idea. The cost function, and with it the neural net estimator, can be trained with stochastic gradient descent.
A complete setup of a neural net, trained for recognizing MNIST digits can look like this:
var estimator = NeuralNet(layerSizes: [28*28, 40, 10])
//change the activation functions of the first two layers to rectified linear units
estimator.layers[0].activationFunction = ReLU(secondarySlope: 0.01)
estimator.layers[1].activationFunction = ReLU(secondarySlope: 0.01)
var neuralNetCost = SquaredErrorCost(forEstimator: estimator)
//add parameter decay for the weights of layers 1 and 2
let regularizer = ParameterDecay(decayRate: 0.0001)
neuralNetCost.regularizers[0] = regularizer
neuralNetCost.regularizers[2] = regularizer
//train the neural net for 30 epochs
stochasticGradientDescent(neuralNetCost, inputs: trainingData[.a, .b], targets: trainingLabels[.a, .c], updateRate: 0.1, minibatchSize: 50, validationCallback: ({ (epoch, estimator) -> (Bool) in
if(epoch >= 30) {return true}
else {return false}
}))
-
Basic Documentation
-
Applications