Skip to content

Commit

Permalink
[TESTS][DOCS] from zeta.nn.modules.dense_connect import DenseBlock
Browse files Browse the repository at this point in the history
from zeta.nn.modules.highway_layer import HighwayLayer
from zeta.nn.modules.multi_scale_block import MultiScaleBlock
from zeta.nn.modules.feedback_block import FeedbackBlock
from zeta.nn.modules.dual_path_block import DualPathBlock
from zeta.nn.modules.recursive_block import RecursiveBlock
from zeta.nn.modules._activations import (
    PytorchGELUTanh,
    NewGELUActivation,
    GELUActivation,
    FastGELUActivation,
    QuickGELUActivation,
    ClippedGELUActivation,
    AccurateGELUActivation,
    MishActivation,
    LinearActivation,
    LaplaceActivation,
    ReLUSquaredActivation,
)]
  • Loading branch information
Kye committed Dec 27, 2023
1 parent a71ba60 commit 10aa88a
Show file tree
Hide file tree
Showing 45 changed files with 3,009 additions and 3 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ build/
develop-eggs/
dist/
downloads/
.errors.txt
eggs/
.eggs/
lib/
Expand All @@ -24,6 +25,7 @@ parts/
sdist/
var/
wheels/
errors.txt
share/python-wheels/
*.egg-info/
.installed.cfg
Expand Down
103 changes: 103 additions & 0 deletions docs/zeta/nn/modules/accurategeluactivation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# AccurateGELUActivation

## Overview
The AccurateGELUActivation class is a part of the PyTorch library's nn.Module. This class allows us to apply the Gaussian Error Linear Unit (GELU) approximation that is faster than the default and more accurate than QuickGELU. This can be useful in situations where the default GELU is considered computationally expensive or its speed could be an issue. The implementation of this class comes as a support for MEGA, which stands for Moving Average Equipped Gated Attention, in neural networks.

The class has been designed following the work on GELUs available at: [https://github.com/hendrycks/GELUs](https://github.com/hendrycks/GELUs)

## Class Definition
Here is a look at the parameters and methods used in the `AccurateGELUActivation` class:

```python
class AccurateGELUActivation(nn.Module):
"""
Applies GELU approximation that is faster than default and more accurate than QuickGELU. See:
https://github.com/hendrycks/GELUs
Implemented along with MEGA (Moving Average Equipped Gated Attention)
"""

def __init__(self):
super().__init__()
self.precomputed_constant = math.sqrt(2 / math.pi)

def forward(self, input: Tensor) -> Tensor:
return (
0.5
* input
* (
1
+ torch.tanh(
self.precomputed_constant
* (input + 0.044715 * torch.pow(input, 3))
)
)
)
```

The class does not require any parameters during initialization. Here are the explanations for the various attributes and methods in the class:

| Method/Attribute | Description | Argument |
| --- | --- | --- |
| `__init__` | This is the constructor method that gets called when an object is created from the class. | None |
| `forward` | This method is a PyTorch standard for forward propagation in a Module or a neural network layer. It accepts a tensor input and returns a tensor. | `input: Tensor` |

## Class Usage
Now, let's look at some examples of how to use this class.

### Example 1: Basic Usage
```python
import torch
from torch.nn import Module
import math
from torch import Tensor
from zeta import AccurateGELUActivation

# Create an instance of the class
gelu_activation = AccurateGELUActivation()

# Create a PyTorch tensor
input = torch.tensor([[-1.0, -0.1, 0.1, 1.0], [0.5, -0.2, -2.1, 3.2]], dtype=torch.float32)

# Use the AccurateGELUActivation instance to activate the input
output = gelu_activation(input)

print(output)
```
This example demonstrates the functionalities of the AccurateGELUActivation module for a defined two-dimensional input tensor.

### Example 2: Applying on Neural Network
The AccurateGELUActivation module can also be used as an activation layer in a PyTorch model.

```python
import torch
from torch.nn import Module, Linear
import math
from torch import Tensor
from zeta.nn import AccurateGELUActivation

class Net(Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = Linear(10, 5)
self.fc2 = Linear(5, 2)
self.activation = AccurateGELUActivation()

def forward(self, x: Tensor) -> Tensor:
x = self.fc1(x)
x = self.activation(x)
x = self.fc2(x)
return x

# Create a model from the neural network class
model = Net()

input = torch.randn(3, 10)

# Pass the input to the model
output = model(input)

print(output)
```
This example shows how the AccurateGELUActivation module can be integrated as a layer in a neural network model to perform activation on the intermediate outputs of the neural network model.

**Note:** Please remember, understanding what activation functions like GELU can do, what benefits they can bring to your architecture, is crucial before applying it to your models.
79 changes: 79 additions & 0 deletions docs/zeta/nn/modules/clippedgeluactivation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# ClippedGELUActivation


The ClippedGELUActivation class is designed to clip the possible output range of Gaussian Error Linear Unit (GeLU) activation between a given minimum and maximum value. This is specifically useful for the quantization purpose, as it allows mapping negative values in the GeLU spectrum. To learn more about the underlying concept, you can refer to an academic paper titled [Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference](https://arxiv.org/pdf/1712.05877.pdf).

The original implementation of the GeLU activation function was introduced in the Google BERT repository. Note that OpenAI GPT's GeLU is slightly different and gives slightly different results.

## Class Definition

The ClippedGELUActivation class inherits from the `nn.Module` in PyTorch.

```python
class ClippedGELUActivation(nn.Module):
def __init__(self, min: float, max: float):
if min > max:
raise ValueError(
f"min should be < max (got min: {min}, max: {max})"
)

super().__init__()
self.min = min
self.max = max

def forward(self, x: Tensor) -> Tensor:
return torch.clip(gelu(x), self.min, self.max)
```

## Class Arguments

| Argument | Type | Description |
|:--------:|:-------:|:----------------------------------------------------------------------------:|
| min | float | The lower limit for the output of GeLU activation. It should be less than `max` |
| max | float | The upper limit for the output of GeLU activation. It should be greater than `min` |

Note: If `min` is greater than `max`, a ValueError will be raised.

## Forward Method Arguments

| Argument | Type | Description |
|:--------:|:-------:|:----------------------------------------------------------------------------:|
| x | Tensor | Input tensor for the forward function of the module |

## Class Example

In the code below, we initialize the ClippedGELUActivation module with a min and max value and input a tensor `x`:

```python
import torch
from torch import nn, Tensor
from torch.nn.functional import gelu
from zeta.nn import ClippedGELUActivation

# Initialize the class
clipped_gelu = ClippedGELUActivation(min=-3.0, max=3.0)

# Create a tensor
x = torch.randn(3,3)

# Pass the tensor through the module
output = clipped_gelu(x)
```

In this instance, the output tensor would have each of its elements limited to be within the range of -3.0 to 3.0, inclusively.

## Notes

While using this class be cautious of the following:
- The class does not check if the `max` argument is less than the `min` argument. Providing a `max` which is less than `min` will raise a ValueError.
- The `forward` method does not check if all elements of the input Tensor `x` are numeric. Non-numeric input may result in unexpected behavior or errors.

## References

For additional information and further exploration about GeLU and its applications, please refer to the following resources:

1. [Gaussian Error Linear Units (GELUs)](https://arxiv.org/abs/1606.08415)
2. [Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference](https://arxiv.org/abs/1712.05877)
3. [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)

Note: In our documentation, we provided information about the CythonGELU and its methods. The details regarding the parameters, method details, and usage examples were provided to ensure the understanding of the class and methods.
132 changes: 132 additions & 0 deletions docs/zeta/nn/modules/denseblock.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# Class Name: DenseBlock

The `DenseBlock` class is a type of PyTorch `nn.Module`. This allows for complicated neural network architectures to be defined with individual abstracted layers. The class gets its name from the dense connections made in the forward propagation, which involve concatenating the output of the `submodule` with the original input.

For the following documentation, the DenseBlock class is used as an example of such constructions.

While this class might seem simple, understanding how it works is fundamental to define, compile, and use your own custom PyTorch models.

It has two main methods, the `__init__()` method and the `forward()` method.

### Method: \_\_init__(self, submodule, *args, **kwargs)

The `__init__()` method is the initializer method of the DenseBlock class. It is called when an object (an instance of the class) is created.

This method sets an attribute of the DenseBlock object to be the `submodule` input, which is assumed to be some `nn.Module` instance.

The method signature is:

def __init__(self, submodule, *args, **kwargs)

#### Arguments

|Name|Type|Description|
|---|---|---|
|submodule|nn.Module|The module that will be applied in the forward pass.|
|args|Variable length argument list|Unused in this implementation, but allows for extra position arguments.|
|kwargs|Arbitrary keyword arguments|Unused in this implementation, but allows for extra keyword arguments.|

The `submodule` argument should be an initialized instance of the `nn.Module` subclass you want to apply.

The `args` and `kwargs` arguments are not currently used in DenseBlock.

### Method: forward(self, x: torch.Tensor) -> torch.Tensor

The `forward()` method is called during the forward propagation of the neural network.

It applies the module operation to the input tensor `x` and concatenates the input tensor `x` with the output of the `submodule`.

The method signature is:

def forward(self, x: torch.Tensor) -> torch.Tensor

#### Arguments

|Name|Type|Description|
|---|---|---|
|x|torch.Tensor|The input tensor to the module.|

Returns a tensor, which is the input tensor concatenated with the processed input tensor via the `submodule`.

## Usage Examples

Here are some examples showing how to use the DenseBlock class. These examples will include the necessary imports, data creation, and model instantiation following PyTorch conventions:

### Example 1: Basic Usage with a Linear Layer

In this example, the `DenseBlock` will include a Linear layer as submodule.

```python
import torch
import torch.nn as nn
from torch.autograd import Variable
from zeta.nn import DenseBlock

# Defining submodule
lin_layer = nn.Linear(5, 10)

# Defining DenseBlock
dense_block = DenseBlock(lin_layer)

# Creating a random tensor of shape [10, 5]
random_tensor = Variable(torch.randn(10, 5))

# Applying DenseBlock
output = dense_block(random_tensor)
```

In this example, an input tensor of shape [10,5] is given to a dense block with a linear layer. The input will have shape [10,5] and the output of the linear layer will have shape [10,10], resulting in the output of the dense block to have shape [10,15].

### Example 2: Using DenseBlock in a Multilayer Neural Network

In this example, a 2-layer neural network using Dense Blocks is shown. The first layer is a Dense Block with a Linear module transforming with dimensions (10 to 5), and the second layer is a standard Linear layer transforming the output dimensions (15 to 1).
```python
import torch.nn.functional as F

# Defining a custom model
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.layer1 = DenseBlock(nn.Linear(10, 5))
self.layer2 = nn.Linear(15, 1)

def forward(self, x):
x = F.relu(self.layer1(x))
x = self.layer2(x)
return x

# Initializing the model
net = Net()

# Creating a random tensor of shape [32, 10]
data = Variable(torch.randn(32, 10))

# Forward propagation
output = net(data)
```

In this second example, a data batch with `32` samples and input dimensionality of `10` is given to a `Net` neural network with dense connections in their first layer. The final output shape is [32, 1].

### Example 3: DenseBlock with Convolutional Layer

Lastly, this example shows how to use DenseBlock inside a Convolutional Neural Network:
```python
import torch
import torch.nn as nn
from zeta.nn import DenseBlock

cnn = nn.Sequential(
nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
DenseBlock(nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1)),
nn.AdaptiveAvgPool2d((1, 1)),
nn.Flatten(),
nn.Linear(128, 10),
)

x = torch.randn(1, 1, 224, 224)
output = cnn(x)
```

Here, a 2D convolutional layer is used as the submodule within the DenseBlock. The DenseBlock receives a tensor with shape [64, 224, 224] as input, applies the convolutional layer (keeping the same shape), and then concatenates the input and the output along the channel dimension, resulting in a tensor with shape [128, 224, 224].
Loading

0 comments on commit 10aa88a

Please sign in to comment.