The basic unit of work in a neural network is the Artificial Neuron
. An Artificial Neuron
has an associated potential to emit a signal. For convenience the value of the potential is kept between and . If the potential the neuron is active, if the neuron is inactive. We can implement the Artificial Neuron
as a function with an array of activation values
i.e. in it's internal scope. The function parameters are an array of weight values
or . The output then is the signal . The values and are defined as tensors because the types of operations or functions that will be used to manipulate the Artificial Neurons
comes from a branch of mathematics called Tensor Analysis. Consider the implementation of based on the following:
- We define tensors and .
- Multiply tensors and i.e. .
- The tensor product will be .
- Reduce to a scalar value by adding it's components.
- The sum of the components is .
- is called a weighted sum and is represented by where is the number of elements in .
- determines the strength of the signal emitted by the
Artificial Neuron
. - Capping adds additional control over signal emission and is done by subtracting a bias from the sum.
- It is possible for to have a value outside the desire signal strength . For this reason an
activation function
is used to bring into the desired range. - One of the commonly used
activation functions
is thesigmoid
.
In conclusion the implementation of an Artificial Neuron is the function .
A neural network
is a computational graph of Artificial Neurons
. Neural networks
are composed of neural network layers
. A neural network layer
is a tensor of Artificial Neurons
. The Artificial Neurons
in a neural network layer
are connected to each other because they are components of a tensor. We can define layer n as . Neural networks have three layer types input, hidden, and output
. A neural network may have multiple hidden layers but only one input and output layers. Consider a neural network consisting of the fallowing layers:
Neural network themselves are tensors. In this case neural network . Artificial Neurons
in a neural network are associated to each other via function composition
. Consider it has an internal tensor of activation values
. The number of components in is one. The output of is a potential . The key question one must ask at this point is, how are the number of activation values
in associated to the number of activation values
in ? Here is where the magic happens becomes the input weight for and . This means that becomes a weight value tensor and the input for and . This means that the activation values tensor for is a tensor with one component because the input layer consist of only one component . Is important to notice that the number of activation values in a layer's Artificial Neurons are determined by the number of Artificial Neurons in the previous layer
.
For completeness let's consider the output layer Artificial Neuron
. Based on our current understanding has an internal tensor of activation values
because has two components and . The output for is a potential and the output for is a potential therefore the weight value tensor is . The weighted sum for is and its potential is . Notice how all Artificial Neurons in every layer of the neural network are relaying information i.e. emitting a signal directly or indirectly to each other in a forward direction. The type of neural network where all Artificial Neurons are connected to each other is called a dense neural network.
We define a neural network algorithm
as a function that produces an output
in response to an input
and n number of hidden layers i.e. . A neural network is a system defined by the following tensor .
In our daily experience we go through time and we have a state
at each moment in time. Our reality is a series of moments in time. At each moment we can assess our state
and map any number of metrics to an exact moment in time and persist the resulting information representing our state
. Our memories are our state
and we derive knowledge from them. Compare to you or me is a very simple system, a moment of time for is represented by evaluating and at a given value of . We bring to life by feeding it input
and evaluating the output
of every Artificial Neuron
in each neural network layer .
Back propagation is the most widely used machine learning algorithm. The algorithm's objective is to find the optimal values for that will yield expected outputs in through a training process. The algorithm's steps are:
- Initialize Artificial Neurons in by assigning random to every and in the range .
- Iterate over the training dataset.
- For each item in the dataset
forward propagate
by invoking the activation function on Artificial Neuron from (all hidden layers) to (the output layer) using the input value for each item in the data set as the input. The signals of Artificial Neurons in a previous layer became the input for for the current layer. Backwards propagate the error
by iterating over the layers in reverse order and calculating the error between the current output and the expected output the output for the corresponding input in the dataset. One of the most commonly used error, cost, or loss functions to compare vs. is the Mean Squared Error function . The error indicates how close the signal is to .- Compute the rate of change in the cost function. The rate of change of a single variable function with one scalar output is called a derivative i.e. . The rate of change of a multi variable function with one scalar output is called the
gradient
i.e . Thegradient
indicates the direction and magnitude of greatest increase for the error function. In this case the needs to be computed since we are dealing with multi variable tensors. - needs to be negative because the objective is to advance towards lower error or cost i.e. . Define
learning rate
a number , used as a factor that determines the magnitude of in conjunction with . Definemomentum
a number between , used as a factor that determines the magnitude of in conjunction with . The magnitude of will determine how big of a step we take in our search to minimize the error or cost . The magnitude of will determine how much of an influence the previous values of have in our search to minimize the error or cost . Compute the scalar values by which needs change in order to decrease error i.e. bring closer . Follow the same procedure to fine tune the bias . - After iterating over the complete training data set verify that the current error is less or equal to the
error threshold
or that themaximum number of iterations
was reached, if true stop training else continue. Each complete iteration over all items in a training set is called anepoch
.
Is crucial to understand that and associated biases change as a result of back propagation
while changes as a result of forward propagation
. This means that properly labeled data is essential for training and how well performs. When practicing machine learning you will be presented with the opportunity to adjust so called hyper parameters some of them are:
The process of preparing training data sets is challenging. The key to the process is proper vectorization and labeling of training data. Neural networks can be applied to all kind of problems involving regression, classification, or prediction. The way data is prepared for training requires careful consideration of the domain and the goals one intents to achieve.
Imagine we have a set of data representing the horse power , and the miles per gallon of a model . The array represents an element in our raw dataset. Our objective is to determine if there is a relationship between and and to design a neural network that will help us predict the given . To prepare the data for consumption we need understand what are the inputs and outputs for our model. Since our intent is to predict in relationship to regardless of the model, then our training data becomes . The last step in the process is data normalization, and is usually accomplished by min-max feature scaling. The function for min-max feature scaling
is where is any value, is the maximum, and is the minimum in the array . Normalization assures that the value is always within the range . In our case study and the expected output . Normalization is necessary because it brings any data set to the necessary range .
Neural networks are computational graphs used to universally model functions. The majority of relationships represented by functions are not linear, for this reason logistic functions like , or are used to modulate signals, they introduce nonlinearity to the Artificial Neuron model which increases the scope of problems we can solve. Normalization helps by keeping everything at the same scale and allows the system to be more sensitive when recognizing patterns. When a neural network is trained it becomes a function in tensor form specific to the training domain. After training, the acquired knowledge can be preserved by serializing , associated biases, and all the hyper parameters used during training. The resulting kernel of knowledge is very tiny in comparison to the training data and could be used almost anywhere including a web browser. When utilizing neural networks for regression, prediction, or classification the activation values come form the input provided and the wights are not changed. The activation values flow forward from the input layer to the output layer.
In the examples directory you can find several examples using the Brain.JS framework to create and train neural networks.