[TESTS][DOCS] from zeta.nn.modules.dense_connect import DenseBlock

from zeta.nn.modules.highway_layer import HighwayLayer from zeta.nn.modules.multi_scale_block import MultiScaleBlock from zeta.nn.modules.feedback_block import FeedbackBlock from zeta.nn.modules.dual_path_block import DualPathBlock from zeta.nn.modules.recursive_block import RecursiveBlock from zeta.nn.modules._activations import ( PytorchGELUTanh, NewGELUActivation, GELUActivation, FastGELUActivation, QuickGELUActivation, ClippedGELUActivation, AccurateGELUActivation, MishActivation, LinearActivation, LaplaceActivation, ReLUSquaredActivation, )]
kyegomez · Dec 27, 2023 · 10aa88a · 10aa88a
1 parent a71ba60
commit 10aa88a
Show file tree

Hide file tree

Showing 45 changed files with 3,009 additions and 3 deletions.
diff --git a/.gitignore b/.gitignore
@@ -16,6 +16,7 @@ build/
 develop-eggs/
 dist/
 downloads/
+.errors.txt
 eggs/
 .eggs/
 lib/
@@ -24,6 +25,7 @@ parts/
 sdist/
 var/
 wheels/
+errors.txt
 share/python-wheels/
 *.egg-info/
 .installed.cfg

diff --git a/docs/zeta/nn/modules/accurategeluactivation.md b/docs/zeta/nn/modules/accurategeluactivation.md
@@ -0,0 +1,103 @@
+# AccurateGELUActivation
+
+## Overview
+The AccurateGELUActivation class is a part of the PyTorch library's nn.Module. This class allows us to apply the Gaussian Error Linear Unit (GELU) approximation that is faster than the default and more accurate than QuickGELU. This can be useful in situations where the default GELU is considered computationally expensive or its speed could be an issue. The implementation of this class comes as a support for MEGA, which stands for Moving Average Equipped Gated Attention, in neural networks.
+
+The class has been designed following the work on GELUs available at: [https://github.com/hendrycks/GELUs](https://github.com/hendrycks/GELUs)
+
+## Class Definition
+Here is a look at the parameters and methods used in the `AccurateGELUActivation` class:
+
+```python
+class AccurateGELUActivation(nn.Module):
+    """
+    Applies GELU approximation that is faster than default and more accurate than QuickGELU. See:
+    https://github.com/hendrycks/GELUs
+    Implemented along with MEGA (Moving Average Equipped Gated Attention)
+    """
+
+    def __init__(self):
+        super().__init__()
+        self.precomputed_constant = math.sqrt(2 / math.pi)
+
+    def forward(self, input: Tensor) -> Tensor:
+        return (
+            0.5
+            * input
+            * (
+                1
+                + torch.tanh(
+                    self.precomputed_constant
+                    * (input + 0.044715 * torch.pow(input, 3))
+                )
+            )
+        )
+```
+
+The class does not require any parameters during initialization. Here are the explanations for the various attributes and methods in the class:
+
+| Method/Attribute | Description | Argument |
+| --- | --- | --- |
+| `__init__` | This is the constructor method that gets called when an object is created from the class. | None |
+| `forward` | This method is a PyTorch standard for forward propagation in a Module or a neural network layer. It accepts a tensor input and returns a tensor. | `input: Tensor` |
+
+## Class Usage
+Now, let's look at some examples of how to use this class.
+
+### Example 1: Basic Usage
+```python
+import torch
+from torch.nn import Module
+import math
+from torch import Tensor
+from zeta import AccurateGELUActivation
+
+# Create an instance of the class
+gelu_activation = AccurateGELUActivation()
+
+# Create a PyTorch tensor
+input = torch.tensor([[-1.0, -0.1, 0.1, 1.0], [0.5, -0.2, -2.1, 3.2]], dtype=torch.float32)
+
+# Use the AccurateGELUActivation instance to activate the input
+output = gelu_activation(input)
+
+print(output)
+```
+This example demonstrates the functionalities of the AccurateGELUActivation module for a defined two-dimensional input tensor.
+
+### Example 2: Applying on Neural Network
+The AccurateGELUActivation module can also be used as an activation layer in a PyTorch model.
+
+```python
+import torch
+from torch.nn import Module, Linear
+import math
+from torch import Tensor
+from zeta.nn import AccurateGELUActivation
+
+class Net(Module):
+    def __init__(self):
+        super(Net, self).__init__()
+        self.fc1 = Linear(10, 5)
+        self.fc2 = Linear(5, 2)
+        self.activation = AccurateGELUActivation()
+
+    def forward(self, x: Tensor) -> Tensor:
+        x = self.fc1(x)
+        x = self.activation(x)
+        x = self.fc2(x)
+        return x     
+
+# Create a model from the neural network class
+model = Net()
+
+input = torch.randn(3, 10)
+
+# Pass the input to the model
+output = model(input)
+
+print(output)
+```
+This example shows how the AccurateGELUActivation module can be integrated as a layer in a neural network model to perform activation on the intermediate outputs of the neural network model.
+
+**Note:** Please remember, understanding what activation functions like GELU can do, what benefits they can bring to your architecture, is crucial before applying it to your models.
diff --git a/docs/zeta/nn/modules/clippedgeluactivation.md b/docs/zeta/nn/modules/clippedgeluactivation.md
@@ -0,0 +1,79 @@
+# ClippedGELUActivation
+
+
+The ClippedGELUActivation class is designed to clip the possible output range of Gaussian Error Linear Unit (GeLU) activation between a given minimum and maximum value. This is specifically useful for the quantization purpose, as it allows mapping negative values in the GeLU spectrum. To learn more about the underlying concept, you can refer to an academic paper titled [Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference](https://arxiv.org/pdf/1712.05877.pdf).
+
+The original implementation of the GeLU activation function was introduced in the Google BERT repository. Note that OpenAI GPT's GeLU is slightly different and gives slightly different results.
+
+## Class Definition
+
+The ClippedGELUActivation class inherits from the `nn.Module` in PyTorch.
+
+```python
+class ClippedGELUActivation(nn.Module):
+    def __init__(self, min: float, max: float):
+        if min > max:
+            raise ValueError(
+            f"min should be < max (got min: {min}, max: {max})"
+        )
+
+        super().__init__()
+        self.min = min
+        self.max = max
+
+    def forward(self, x: Tensor) -> Tensor:
+        return torch.clip(gelu(x), self.min, self.max)
+```
+
+## Class Arguments
+
+| Argument |   Type  |                                 Description                                  |
+|:--------:|:-------:|:----------------------------------------------------------------------------:|
+|    min   |  float  |   The lower limit for the output of GeLU activation. It should be less than `max`  |
+|    max   |  float  |   The upper limit for the output of GeLU activation. It should be greater than `min` |
+
+Note: If `min` is greater than `max`, a ValueError will be raised.
+
+## Forward Method Arguments
+
+| Argument |   Type  |                                 Description                                  |
+|:--------:|:-------:|:----------------------------------------------------------------------------:|
+|    x    |  Tensor  |   Input tensor for the forward function of the module   |
+
+## Class Example
+
+In the code below, we initialize the ClippedGELUActivation module with a min and max value and input a tensor `x`:
+
+```python
+import torch
+from torch import nn, Tensor
+from torch.nn.functional import gelu
+from zeta.nn import ClippedGELUActivation
+
+# Initialize the class
+clipped_gelu = ClippedGELUActivation(min=-3.0, max=3.0)
+
+# Create a tensor
+x = torch.randn(3,3)
+
+# Pass the tensor through the module
+output = clipped_gelu(x)
+```
+
+In this instance, the output tensor would have each of its elements limited to be within the range of -3.0 to 3.0, inclusively.
+
+## Notes
+
+While using this class be cautious of the following:
+- The class does not check if the `max` argument is less than the `min` argument. Providing a `max` which is less than `min` will raise a ValueError.
+- The `forward` method does not check if all elements of the input Tensor `x` are numeric. Non-numeric input may result in unexpected behavior or errors.
+
+## References 
+
+For additional information and further exploration about GeLU and its applications, please refer to the following resources:
+
+1. [Gaussian Error Linear Units (GELUs)](https://arxiv.org/abs/1606.08415)
+2. [Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference](https://arxiv.org/abs/1712.05877)
+3. [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
+
+Note: In our documentation, we provided information about the CythonGELU and its methods. The details regarding the parameters, method details, and usage examples were provided to ensure the understanding of the class and methods.
diff --git a/docs/zeta/nn/modules/denseblock.md b/docs/zeta/nn/modules/denseblock.md
@@ -0,0 +1,132 @@
+# Class Name: DenseBlock
+
+The `DenseBlock` class is a type of PyTorch `nn.Module`. This allows for complicated neural network architectures to be defined with individual abstracted layers. The class gets its name from the dense connections made in the forward propagation, which involve concatenating the output of the `submodule` with the original input. 
+
+For the following documentation, the DenseBlock class is used as an example of such constructions. 
+
+While this class might seem simple, understanding how it works is fundamental to define, compile, and use your own custom PyTorch models. 
+
+It has two main methods, the `__init__()` method and the `forward()` method.
+
+### Method: \_\_init__(self, submodule, *args, **kwargs)
+
+The `__init__()` method is the initializer method of the DenseBlock class. It is called when an object (an instance of the class) is created. 
+
+This method sets an attribute of the DenseBlock object to be the `submodule` input, which is assumed to be some `nn.Module` instance.
+
+The method signature is:
+
+    def __init__(self, submodule, *args, **kwargs)
+
+#### Arguments
+
+|Name|Type|Description|
+|---|---|---|
+|submodule|nn.Module|The module that will be applied in the forward pass.|
+|args|Variable length argument list|Unused in this implementation, but allows for extra position arguments.|
+|kwargs|Arbitrary keyword arguments|Unused in this implementation, but allows for extra keyword arguments.|
+
+The `submodule` argument should be an initialized instance of the `nn.Module` subclass you want to apply. 
+
+The `args` and `kwargs` arguments are not currently used in DenseBlock. 
+
+### Method: forward(self, x: torch.Tensor) -> torch.Tensor
+
+The `forward()` method is called during the forward propagation of the neural network. 
+
+It applies the module operation to the input tensor `x` and concatenates the input tensor `x` with the output of the `submodule`.
+
+The method signature is:
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor
+
+#### Arguments
+
+|Name|Type|Description|
+|---|---|---|
+|x|torch.Tensor|The input tensor to the module.|
+
+Returns a tensor, which is the input tensor concatenated with the processed input tensor via the `submodule`.
+
+## Usage Examples
+
+Here are some examples showing how to use the DenseBlock class. These examples will include the necessary imports, data creation, and model instantiation following PyTorch conventions:
+
+### Example 1: Basic Usage with a Linear Layer
+
+In this example, the `DenseBlock` will include a Linear layer as submodule.
+
+```python
+import torch
+import torch.nn as nn
+from torch.autograd import Variable
+from zeta.nn import DenseBlock
+
+# Defining submodule
+lin_layer = nn.Linear(5, 10)
+
+# Defining DenseBlock
+dense_block = DenseBlock(lin_layer)
+
+# Creating a random tensor of shape [10, 5]
+random_tensor = Variable(torch.randn(10, 5))
+
+# Applying DenseBlock
+output = dense_block(random_tensor)
+```
+
+In this example, an input tensor of shape [10,5] is given to a dense block with a linear layer. The input will have shape [10,5] and the output of the linear layer will have shape [10,10], resulting in the output of the dense block to have shape [10,15].
+
+### Example 2: Using DenseBlock in a Multilayer Neural Network
+
+In this example, a 2-layer neural network using Dense Blocks is shown. The first layer is a Dense Block with a Linear module transforming with dimensions (10 to 5), and the second layer is a standard Linear layer transforming the output dimensions (15 to 1).
+```python
+import torch.nn.functional as F
+
+# Defining a custom model
+class Net(nn.Module):
+    def __init__(self):
+        super(Net, self).__init__()
+        self.layer1 = DenseBlock(nn.Linear(10, 5))
+        self.layer2 = nn.Linear(15, 1)
+
+    def forward(self, x):
+        x = F.relu(self.layer1(x))
+        x = self.layer2(x)
+        return x
+
+# Initializing the model
+net = Net()
+
+# Creating a random tensor of shape [32, 10]
+data = Variable(torch.randn(32, 10))
+
+# Forward propagation
+output = net(data)
+```
+
+In this second example, a data batch with `32` samples and input dimensionality of `10` is given to a `Net` neural network with dense connections in their first layer. The final output shape is [32, 1]. 
+
+### Example 3: DenseBlock with Convolutional Layer
+
+Lastly, this example shows how to use DenseBlock inside a Convolutional Neural Network:
+```python
+import torch
+import torch.nn as nn
+from zeta.nn import DenseBlock
+
+cnn = nn.Sequential(
+    nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
+    nn.ReLU(inplace=True),
+    nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
+    DenseBlock(nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1)),
+    nn.AdaptiveAvgPool2d((1, 1)),
+    nn.Flatten(),
+    nn.Linear(128, 10),
+)
+
+x = torch.randn(1, 1, 224, 224)
+output = cnn(x)
+```
+
+Here, a 2D convolutional layer is used as the submodule within the DenseBlock. The DenseBlock receives a tensor with shape [64, 224, 224] as input, applies the convolutional layer (keeping the same shape), and then concatenates the input and the output along the channel dimension, resulting in a tensor with shape [128, 224, 224].