An implementation of additional functionalities for Convnetjs, including various activation functions.
- LeakyRELU
- ELU
- FReLU
- GeLU
- PReLU
- RReLU
- PLU
- PiLU
- DoubleReLU
- Swish
- Mish
- Gish
- Logish
- Softplus
- Softmin
- Softsign
- Softshrink
- Hardshrink
Simply add the convnetjs-extras scripts along with the convnetjs library.
<script src="modules/convnet-min.js"></script>
<script src="modules/convnet-extras-min.js"></script>
Leaky Rectified Linear Unit (LeakyRELU) is an activation function that allows a small, non-zero gradient when the unit is not active, which helps to keep the information flowing through the network during training.
Definition: The Leaky Rectified Linear Unit (LeakyRELU) activation function is defined as:
Derivative:
Example:
layer_defs.push({type:'fc', num_neurons:20, activation:'leaky_relu', alpha: 0.1});
where
Exponential Linear Unit (ELU) is an activation function that tends to converge cost to zero faster and produce more accurate results. It adds a smooth transition to negative inputs, reducing the vanishing gradient problem.
Definition: The Exponential Linear Unit (ELU) activation function is defined as:
Derivative:
Example:
layer_defs.push({type:'fc', num_neurons:20, activation:'elu', alpha: 0.01});
where
Flexible Rectified Linear Units (FReLU) is a variant of the ReLU activation function that includes a fixed parameter for the threshold, adding flexibility to the learning process.
Definition: The Fixed Rectified Linear Unit (FReLU) activation function is defined as:
Derivative:
Example:
layer_defs.push({type:'fc', num_neurons:20, activation:'frelu'});
Swish is a smooth, non-monotonic activation function that tends to perform better than ReLU on deeper models. It is defined as the product of the input and its sigmoid function.
Definition: The Swish (or Sigmoid Linear Unit, SiLU) activation function is defined as:
Derivative:
where
Example:
layer_defs.push({type:'fc', num_neurons:20, activation:'swish'});
Piecewise Linear Unit (PLU) is an activation function that allows for multiple linear segments, which can help in capturing complex patterns in the data.
Definition: The Piecewise Linear Unit (PLU) activation function is defined as:
Derivative:
Example:
layer_defs.push({type:'fc', num_neurons:20, activation:'plu'});
Piecewise Linear Unit (PiLU) is an activation function that introduces two linear segments with different slopes, creating a piecewise linear transition. It enhances the representational capacity of the network by allowing for more flexible and diverse transformations of the input data.
Definition: The Piecewise Linear Unit (PiLU) activation function is defined as:
Derivative:
Example:
layer_defs.push({type:'fc', num_neurons:20, activation:'pilu'});
DoubleReLU is an activation function composed of two rectified linear units (ReLU) applied sequentially. It provides a simple yet effective way to introduce non-linearity into the network while maintaining computational efficiency. The double application of ReLU allows for a more pronounced transformation of the input data, potentially capturing more complex patterns in the data.
Definition: The Double Rectified Linear Unit (DoubleReLU) activation function is defined as:
Example:
layer_defs.push({type:'fc', num_neurons:20, activation:'double_relu', alpha:0.5});
Derivative:
Mish is a self-regularized activation function that smoothly interpolates between the linear and nonlinear regimes. It has shown promising results in various deep learning tasks, often outperforming traditional activation functions like ReLU.
Definition: The Mish activation function is defined as:
Derivative:
Example:
layer_defs.push({type:'fc', num_neurons:20, activation:'mish'});
Gish is a novel activation function that combines exponential and logarithmic transformations to provide robust non-linearity. It has demonstrated strong performance in deep learning models by effectively handling negative inputs and providing smooth gradients, which can enhance training stability and model performance.
Definition: The Gish activation function is defined as:
Derivative:
Example:
layer_defs.push({type:'fc', num_neurons:20, activation:'gish'});
Softplus is a smooth and continuous activation function defined as the logarithm of the exponential of the input plus one. It has the advantage of being differentiable everywhere, which allows for stable gradients during training.
Definition: The Softplus activation function is defined as:
Derivative:
Example:
layer_defs.push({type:'softplus', num_classes:10});
Logish is an activation function that blends the properties of the logarithmic function with the sigmoid function. This function is designed to offer a balance between non-linearity and stability, potentially improving convergence and performance in various neural network architectures.
Definition: The Logish activation function is defined as:
where
Derivative:
Example:
layer_defs.push({type:'fc', num_neurons:20, activation:'logish'});
Softmin is an activation function that applies the softmin operation to its inputs, effectively transforming them into a probability distribution where smaller values are amplified. It is particularly useful for tasks where the goal is to emphasize smaller input values.
Definition: The Softmin activation function is defined as:
Derivative:
Example:
layer_defs.push({type:'softmin', num_classes:10});
Softsign is a smooth and differentiable activation function that approximates the sign function with a soft transition. This function provides a continuous approximation of the sign function, helping to mitigate the problem of vanishing gradients and improving the learning dynamics in neural networks.
Definition: The Softsign activation function is defined as:
Derivative:
Example:
layer_defs.push({type:'softsign', num_classes:10});
Softshrink is a thresholding activation function that introduces sparsity by shrinking values towards zero. This function is useful for regularization and feature selection, as it helps to reduce the impact of small values and promote sparsity in the activations.
Definition: The Softshrink activation function is defined as:
Derivative:
Example:
layer_defs.push({type:'softshrink', num_classes:10, lambda: 0.5});
Hardshrink is a simple threshold-based activation function that sets values within a specific range to zero. This function is effective in scenarios where you want to introduce sparsity and handle outliers by zeroing out values within a certain range.
Definition: The Hardshrink activation function is defined as:
Derivative:
Example:
layer_defs.push({type:'hardshrink', num_classes:10, lambda: 0.5});
Gaussian Error Linear Unit (GeLU) is an activation function that applies a smooth curve to the input values, offering a balance between linearity and non-linearity. It has shown superior performance in various neural network architectures, particularly in natural language processing tasks.
Definition: The Gaussian Error Linear Unit (GeLU) activation function is defined as:
Derivative:
where
Example:
layer_defs.push({type:'fc', num_neurons:20, activation:'gelu'});
Parametric Rectified Linear Unit (PReLU) is an activation function that introduces learnable parameters for the negative slope. It allows the model to adapt the negative part of the function during training, which can improve the model's capacity and performance.
Definition: The Parametric Rectified Linear Unit (PReLU) activation function is defined as:
Derivative:
Example:
layer_defs.push({type:'fc', num_neurons:20, activation:'prelu', alpha: 0.01});
Randomized Leaky Rectified Linear Unit (RReLU) is an activation function that introduces randomness to the negative slope during training, which can act as a form of regularization and help prevent overfitting. The slope is chosen from a uniform distribution within a given range.
Definition: The Randomized Leaky Rectified Linear Unit (RReLU) activation function is defined as:
where
Derivative:
Example:
layer_defs.push({type:'fc', num_neurons:20, activation:'rrelu', lower: 0.01, upper: 0.1});
Some activation functions/loss functions that can be added in the future:
- Smish
- LogSoftmax