Raven Distribution Framework's Deep Learning Library


RavDL - Deep Learning Library

Introducing Raven Protocol's Distributed Deep Learning tool that allows Requesters to easily build, train and test their neural networks by leveraging the compute power of participating nodes across the globe.

RavDL can be thought of as a high level wrapper (written in Python) that defines the mathematical backend for building layers of neural networks by utilizing the fundamental operations from Ravop library to provide essential abstractions for training complex DL architectures in the Ravenverse.

This framework seemlessly integrates with the Ravenverse where the models get divided into optimized subgraphs, which get assigned to the participating nodes for computation in a secure manner. Once all subgraphs have been computed, the saved model will be returned to the requester.

In this manner, a requester can securely train complex models without dedicating his or her own system for this heavy and time-consuming task.

There is something in it for the providers too! The nodes that contribute their processing power will be rewarded with tokens proportionate to the capabilities of their systems and duration of participation. More information is available here.


Table of Contents



Make sure Ravop is installed and working properly.

With PIP

pip install ravdl




Dense(n_units, initial_W=None, initial_w0=None, use_bias='True') 


  • n_units: Output dimension of the layer
  • initial_W: Initial weights of the layer
  • initial_w0: Initial bias of the layer
  • use_bias: Whether to use bias or not


  • Input: (batch_size, ..., input_dim)
  • Output: (batch_size, ..., n_units)


BatchNormalization1D(momentum=0.99, epsilon=0.01, affine=True, initial_gamma=None, initial_beta=None, initial_running_mean=None, initial_running_var=None)


  • momentum: Momentum for the moving average and variance
  • epsilon: Small value to avoid division by zero
  • affine: Whether to learn the scaling and shifting parameters
  • initial_gamma: Initial scaling parameter
  • initial_beta: Initial shifting parameter
  • initial_running_mean: Initial running mean
  • initial_running_var: Initial running variance


  • Input: (batch_size, channels) or (batch_size, channels, length)
  • Output: same as input


BatchNormalization2D(num_features, momentum=0.99, epsilon=0.01, affine=True, initial_gamma=None, initial_beta=None, initial_running_mean=None, initial_running_var=None)


  • num_features: Number of channels in the input
  • momentum: Momentum for the moving average and variance
  • epsilon: Small value to avoid division by zero
  • affine: Whether to learn the scaling and shifting parameters
  • initial_gamma: Initial scaling parameter
  • initial_beta: Initial shifting parameter
  • initial_running_mean: Initial running mean
  • initial_running_var: Initial running variance


  • Input: (batch_size, channels, height, width)
  • Output: same as input


LayerNormalization(normalized_shape=None, epsilon=1e-5, initial_W=None, initial_w0=None)


  • normalized_shape: Shape of the input or integer representing the last dimension of the input
  • epsilon: Small value to avoid division by zero
  • initial_W: Initial weights of the layer
  • initial_w0: Initial bias of the layer


  • Input: (batch_size, ...)
  • Output: same as input




  • p: Probability of dropping out a unit


  • Input: any shape
  • Output: same as input




  • name: Name of the activation function

Currently Supported: 'relu', 'sigmoid', 'tanh', 'softmax', 'leaky_relu','elu', 'selu', 'softplus', 'softsign', 'tanhshrink', 'logsigmoid', 'hardshrink', 'hardtanh', 'softmin', 'softshrink', 'threshold',


  • Input: any shape
  • Output: same as input


Conv2D(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', initial_W=None, initial_w0=None)


  • in_channels: Number of channels in the input image
  • out_channels: Number of channels produced by the convolution
  • kernel_size: Size of the convolving kernel
  • stride: Stride of the convolution
  • padding: Padding added to all 4 sides of the input (int, tuple or string)
  • dilation: Spacing between kernel elements
  • groups: Number of blocked connections from input channels to output channels
  • bias: If True, adds a learnable bias to the output
  • padding_mode: 'zeros', 'reflect', 'replicate' or 'circular'
  • initial_W: Initial weights of the layer
  • initial_w0: Initial bias of the layer


  • Input: (batch_size, in_channels, height, width)
  • Output: (batch_size, out_channels, new_height, new_width)


Flatten(start_dim=1, end_dim=-1)


  • start_dim: First dimension to flatten
  • end_dim: Last dimension to flatten


  • Input: (batch_size, ...)
  • Output: (batch_size, flattened_dimension)


MaxPooling2D(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)


  • kernel_size: Size of the max pooling window
  • stride: Stride of the max pooling window
  • padding: Zero-padding added to both sides of the input
  • dilation: Spacing between kernel elements
  • return_indices: If True, will return the max indices along with the outputs
  • ceil_mode: If True, will use ceil instead of floor to compute the output shape


  • Input: (batch_size, channels, height, width)
  • Output: (batch_size, channels, new_height, new_width)


Embedding(vocab_size, embed_dim, initial_W=None)


  • vocab_size: Size of the vocabulary
  • embed_dim: Dimension of the embedding
  • initial_W: Initial weights of the layer


  • Input: (batch_size, sequence_length)
  • Output: (batch_size, sequence_length, embed_dim)




RMSprop(lr=0.01, alpha=0.99, eps=1e-08, weight_decay=0, momentum=0, centered=False)


  • lr: Learning rate
  • alpha: Smoothing constant
  • eps: Term added to the denominator to improve numerical stability
  • weight_decay: Weight decay (L2 penalty)
  • momentum: Momentum factor
  • centered: If True, compute the centered RMSProp, the gradient is normalized by an estimation of its variance


Adam(lr=0.001, betas=(0.9,0.999), eps=1e-08, weight_decay=0, amsgrad=False)


  • lr: Learning rate
  • betas: Coefficients used for computing running averages of gradient and its square
  • eps: Term added to the denominator to improve numerical stability
  • weight_decay: Weight decay (L2 penalty)
  • amsgrad: If True, use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond



  • Mean Squared Error
ravop.square_loss(y_true, y_pred)
  • Cross Entropy
ravop.cross_entropy_loss(y_true, y_pred, ignore_index=None, reshape_target=None, reshape_label=None)



This section gives a more detailed walkthrough on how a requester can define their ML/DL architectures in Python by using RavDL and Ravop functionalities.

Note: The complete scripts of the functionalities demonstrated in this document are available in the Ravenverse Repository.

Authentication and Graph Definition

The Requester must connect to the Ravenverse using a unique token that they can generate by logging in on Raven's Website using their MetaMask wallet credentials.

import ravop as R

In the Ravenverse, each script executed by a requester is treated as a collection of Ravop Operations called Graph.

Note: In the current release, the requester can execute only 1 graph with their unique token. Therefore, to clear any previous/existing graphs, the requester must use R.flush() method.

The next step involves the creation of a Graph...


R.Graph(name='cnn', algorithm='convolutional_neural_network', approach='distributed')

Note: name and algorithm parameters can be set to any string. However, the approach needs to be set to either "distributed" or "federated".

The Current Release of RavDL supports TorchScript, Functional and Sequential Model Definitions.

GPU Support

The requester can now utilize GPU resources on provider nodes by using the gpu_required parameter to yes in the Graph definition.

R.Graph(name='cnn', algorithm='convolutional_neural_network', approach='distributed', gpu_required='yes')

Important: This feature is currently in beta and is available only for the TorchScript Model Definition.


Functional Model Definition

Define Custom Layers

The latest release of RavDL supports the definition of custom layers by the requester allowing them to write their own application-specific layers either from scratch or as the composition of existing layers.

The custom layer can be defined by inheriting the CustomLayer class from ravdl.v2.layers module. The class defined by the requester must implement certain methods shown as follows:

class CustomLayer1(CustomLayer):
    def __init__(self) -> None:
        self.d1 = Dense(n_hidden, input_shape=(n_features,))
        self.bn1 = BatchNormalization1D(momentum=0.99, epsilon=0.01)

    def _forward_pass_call(self, input, training=True):
        o = self.d1._forward_pass(input)
        o = self.bn1._forward_pass(o, training=training)
        return o

class CustomLayer2(CustomLayer):
    def __init__(self) -> None:
        self.d1 = Dense(30)
        self.dropout = Dropout(0.9)
        self.d2 = Dense(3)

    def _forward_pass_call(self, input, training=True):
        o = self.d1._forward_pass(input)
        o = self.dropout._forward_pass(o, training=training)
        o = self.d2._forward_pass(o)

Defining Custom Model Class

The custom model class can be defined by inheriting the Functional class from ravdl.v2 module. This feature allows the requester to define their own model class by composing the custom and existing layers.

The class defined by the requester must implement certain methods shown as follows:

class ANNModel(Functional):
    def __init__(self, optimizer):
        self.custom_layer1 = CustomLayer1()
        self.custom_layer2 = CustomLayer2()
        self.act = Activation('softmax')

    def _forward_pass_call(self, input, training=True):
        o = self.custom_layer1._forward_pass(input, training=training)
        o = self.custom_layer2._forward_pass(o, training=training)
        o = self.act._forward_pass(o)
        return o

Note: The initialize_params method must be called in the __init__ method of the custom model class. This method initializes the parameters of the model and sets the optimizer for the model.

Defining the Training Loop

The requester can now define their training loop by using the batch_iterator function from ravdl.v2.utils module. This function takes the input and target data as arguments and returns a generator that yields a batch of data at each iteration.

Note that the _forward_pass() and _backward_pass() methods of the custom model class must be called in the training loop.

optimizer = Adam()
model = ANNModel(optimizer)

epochs = 100

for i in range(epochs):
    for X_batch, y_batch in batch_iterator(X, y, batch_size=25):
        X_t = R.t(X_batch)
        y_t = R.t(y_batch)

        out = model._forward_pass(X_t)
        loss = R.square_loss(y_t, out)

Make a Prediction

out = model._forward_pass(R.t(X_test), training=False)

Note: The _forward_pass() method takes an additional argument training which is set to True by default. This argument is used to determine whether the model is in training mode or not. The _forward_pass() method must be called with training=False when making predictions.

Complete example scripts of Functional Model can be found here:


Sequential Model Definition

Setting Model Parameters

from ravdl.v2 import NeuralNetwork
from ravdl.v2.optimizers import RMSprop, Adam
from ravdl.v2.layers import Activation, Dense, BatchNormalization1D, Dropout, Conv2D, Flatten, MaxPooling2D

model = NeuralNetwork(optimizer=RMSprop(),loss='SquareLoss')

Adding Layers to Model

model.add(Dense(n_hidden, input_shape=(n_features,)))

You can view the summary of model in tabular format...


Training the Model

train_err =, y, n_epochs=5, batch_size=25)

By default, the batch losses for each epoch are made to persist in the Ravenverse and can be retrieved later on as and when the computations of those losses are completed.

Testing the Model on Ravenverse

If required, model inference can be tested by using the predict function. The output is stored as an Op and should be made to persist in order to view it later on.

y_test_pred = model.predict(X_test)


TorchScript Model Op

RavDL now supports direct loading and training of Pytorch models on the Ravenverse using our powerful distribution strategies.

Note: The Requester must first convert their Pytorch model to TorchScript model using the torch.jit.script function. We recommend you to refer to the Pytorch documentation for more information on TorchScript.

Creating the Model Op

The Model Op takes a TorchScript model file (.pt) as an argument and creates an Op that can be used to train the model on the Ravenverse.

import ravop as R

model_op = R.model('')

This model op can now be loaded directly into a RavDL model and further used to define the training loop.

from ravdl.v2 import Pytorch_Model
from ravdl.v2.optimizers import Adam

optimizer = Adam()
model = Pytorch_Model(model_op=model_op)

Training the Model

for i in range(epochs):
    for X_batch, y_batch in batch_iterator(X, y, batch_size=256):
        X_t = R.t(X_batch.astype(np.float32))
        y_t = R.t(y_batch.astype(np.float32))
        out = model._forward_pass(X_t)
        loss = R.square_loss(y_t, out)
        # Set step = True whenever optimizer step needs to be called after backprop (defaults to True).
        model._backward_pass(loss, step = True)

Saving and Fetching the Trained Model

Apart from persisting output Ops, the trained model can also be saved on the Ravenverse. This can be done by calling the save_model function.


The saved model can be fetched from the Ravenverse by calling the fetch_persisting_op function.

my_net = R.fetch_persisting_op(op_name='my_net')

Complete example scripts of TorchScript Model loading and training can be found here:


Activating the Graph

Once the model has been defined (Functional/Sequential) and all required Ops for the Graph have been defined, then Graph can be activated and made ready for execution as follows:


Here is what should happen on activating the Graph (the script executed below is available here): ANN_compile


Executing the Graph

Once the Graph has been activated, no more Ops can be added to it. The Graph is now ready for execution. Once Ravop has been initialized with the token, the graph can be executed and tracked as follows:


Here is what should happen on executing the Graph (the script executed below is available here):



Retrieving Persisting Ops

As mentioned above, the batch losses for each epoch can be retrieved as and when they have been computed. The entire Graph need not be computed in order to view a persisting Op that has been computed. Any other Ops that have been made to persist, such as y_test_pred in the example above, can be retrieved as well.

batch_loss = R.fetch_persisting_op(op_name="training_loss_epoch_{}_batch_{}".format(epoch_no, batch_no))
print("training_loss_epoch_1_batch_1: ", batch_loss)

y_test_pred = R.fetch_persisting_op(op_name="test_prediction")
print("Test prediction: ", y_test_pred)

Note: The Ops that have been fetched are of type torch.Tensor.



This project is licensed under the MIT License - see the LICENSE file for details


