Introducing Raven Protocol's Distributed Deep Learning tool that allows Requesters to easily build, train and test their neural networks by leveraging the compute power of participating nodes across the globe.
RavDL can be thought of as a high level wrapper (written in Python) that defines the mathematical backend for building layers of neural networks by utilizing the fundamental operations from Ravop library to provide essential abstractions for training complex DL architectures in the Ravenverse.
This framework seemlessly integrates with the Ravenverse where the models get divided into optimized subgraphs, which get assigned to the participating nodes for computation in a secure manner. Once all subgraphs have been computed, the saved model will be returned to the requester.
In this manner, a requester can securely train complex models without dedicating his or her own system for this heavy and time-consuming task.
There is something in it for the providers too! The nodes that contribute their processing power will be rewarded with tokens proportionate to the capabilities of their systems and duration of participation. More information is available here.
Make sure Ravop is installed and working properly.
pip install ravdl
Dense(n_units, initial_W=None, initial_w0=None, use_bias='True')
n_units
: Output dimension of the layerinitial_W
: Initial weights of the layerinitial_w0
: Initial bias of the layeruse_bias
: Whether to use bias or not
- Input: (batch_size, ..., input_dim)
- Output: (batch_size, ..., n_units)
BatchNormalization1D(momentum=0.99, epsilon=0.01, affine=True, initial_gamma=None, initial_beta=None, initial_running_mean=None, initial_running_var=None)
momentum
: Momentum for the moving average and varianceepsilon
: Small value to avoid division by zeroaffine
: Whether to learn the scaling and shifting parametersinitial_gamma
: Initial scaling parameterinitial_beta
: Initial shifting parameterinitial_running_mean
: Initial running meaninitial_running_var
: Initial running variance
- Input: (batch_size, channels) or (batch_size, channels, length)
- Output: same as input
BatchNormalization2D(num_features, momentum=0.99, epsilon=0.01, affine=True, initial_gamma=None, initial_beta=None, initial_running_mean=None, initial_running_var=None)
num_features
: Number of channels in the inputmomentum
: Momentum for the moving average and varianceepsilon
: Small value to avoid division by zeroaffine
: Whether to learn the scaling and shifting parametersinitial_gamma
: Initial scaling parameterinitial_beta
: Initial shifting parameterinitial_running_mean
: Initial running meaninitial_running_var
: Initial running variance
- Input: (batch_size, channels, height, width)
- Output: same as input
LayerNormalization(normalized_shape=None, epsilon=1e-5, initial_W=None, initial_w0=None)
normalized_shape
: Shape of the input or integer representing the last dimension of the inputepsilon
: Small value to avoid division by zeroinitial_W
: Initial weights of the layerinitial_w0
: Initial bias of the layer
- Input: (batch_size, ...)
- Output: same as input
Dropout(p=0.5)
p
: Probability of dropping out a unit
- Input: any shape
- Output: same as input
Activation(name='relu')
name
: Name of the activation function
Currently Supported: 'relu', 'sigmoid', 'tanh', 'softmax', 'leaky_relu','elu', 'selu', 'softplus', 'softsign', 'tanhshrink', 'logsigmoid', 'hardshrink', 'hardtanh', 'softmin', 'softshrink', 'threshold',
- Input: any shape
- Output: same as input
Conv2D(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', initial_W=None, initial_w0=None)
in_channels
: Number of channels in the input imageout_channels
: Number of channels produced by the convolutionkernel_size
: Size of the convolving kernelstride
: Stride of the convolutionpadding
: Padding added to all 4 sides of the input (int, tuple or string)dilation
: Spacing between kernel elementsgroups
: Number of blocked connections from input channels to output channelsbias
: If True, adds a learnable bias to the outputpadding_mode
: 'zeros', 'reflect', 'replicate' or 'circular'initial_W
: Initial weights of the layerinitial_w0
: Initial bias of the layer
- Input: (batch_size, in_channels, height, width)
- Output: (batch_size, out_channels, new_height, new_width)
Flatten(start_dim=1, end_dim=-1)
start_dim
: First dimension to flattenend_dim
: Last dimension to flatten
- Input: (batch_size, ...)
- Output: (batch_size, flattened_dimension)
MaxPooling2D(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
kernel_size
: Size of the max pooling windowstride
: Stride of the max pooling windowpadding
: Zero-padding added to both sides of the inputdilation
: Spacing between kernel elementsreturn_indices
: If True, will return the max indices along with the outputsceil_mode
: If True, will use ceil instead of floor to compute the output shape
- Input: (batch_size, channels, height, width)
- Output: (batch_size, channels, new_height, new_width)
Embedding(vocab_size, embed_dim, initial_W=None)
vocab_size
: Size of the vocabularyembed_dim
: Dimension of the embeddinginitial_W
: Initial weights of the layer
- Input: (batch_size, sequence_length)
- Output: (batch_size, sequence_length, embed_dim)
RMSprop(lr=0.01, alpha=0.99, eps=1e-08, weight_decay=0, momentum=0, centered=False)
lr
: Learning ratealpha
: Smoothing constanteps
: Term added to the denominator to improve numerical stabilityweight_decay
: Weight decay (L2 penalty)momentum
: Momentum factorcentered
: If True, compute the centered RMSProp, the gradient is normalized by an estimation of its variance
Adam(lr=0.001, betas=(0.9,0.999), eps=1e-08, weight_decay=0, amsgrad=False)
lr
: Learning ratebetas
: Coefficients used for computing running averages of gradient and its squareeps
: Term added to the denominator to improve numerical stabilityweight_decay
: Weight decay (L2 penalty)amsgrad
: If True, use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond
- Mean Squared Error
ravop.square_loss(y_true, y_pred)
- Cross Entropy
ravop.cross_entropy_loss(y_true, y_pred, ignore_index=None, reshape_target=None, reshape_label=None)
This section gives a more detailed walkthrough on how a requester can define their ML/DL architectures in Python by using RavDL and Ravop functionalities.
Note: The complete scripts of the functionalities demonstrated in this document are available in the Ravenverse Repository.
The Requester must connect to the Ravenverse using a unique token that they can generate by logging in on Raven's Website using their MetaMask wallet credentials.
import ravop as R
R.initialize('<TOKEN>')
In the Ravenverse, each script executed by a requester is treated as a collection of Ravop Operations called Graph.
Note: In the current release, the requester can execute only 1 graph with their unique token. Therefore, to clear any previous/existing graphs, the requester must use
R.flush()
method.
The next step involves the creation of a Graph...
R.flush()
R.Graph(name='cnn', algorithm='convolutional_neural_network', approach='distributed')
Note:
name
andalgorithm
parameters can be set to any string. However, theapproach
needs to be set to either "distributed" or "federated".
The Current Release of RavDL supports TorchScript, Functional and Sequential Model Definitions.
The requester can now utilize GPU resources on provider nodes by using the gpu_required
parameter to yes
in the Graph definition.
R.Graph(name='cnn', algorithm='convolutional_neural_network', approach='distributed', gpu_required='yes')
Important: This feature is currently in beta and is available only for the TorchScript Model Definition.
The latest release of RavDL supports the definition of custom layers by the requester allowing them to write their own application-specific layers either from scratch or as the composition of existing layers.
The custom layer can be defined by inheriting the CustomLayer
class from ravdl.v2.layers
module. The class defined by the requester must implement certain methods shown as follows:
class CustomLayer1(CustomLayer):
def __init__(self) -> None:
super().__init__()
self.d1 = Dense(n_hidden, input_shape=(n_features,))
self.bn1 = BatchNormalization1D(momentum=0.99, epsilon=0.01)
def _forward_pass_call(self, input, training=True):
o = self.d1._forward_pass(input)
o = self.bn1._forward_pass(o, training=training)
return o
class CustomLayer2(CustomLayer):
def __init__(self) -> None:
super().__init__()
self.d1 = Dense(30)
self.dropout = Dropout(0.9)
self.d2 = Dense(3)
def _forward_pass_call(self, input, training=True):
o = self.d1._forward_pass(input)
o = self.dropout._forward_pass(o, training=training)
o = self.d2._forward_pass(o)
return
The custom model class can be defined by inheriting the Functional
class from ravdl.v2
module. This feature allows the requester to define their own model class by composing the custom and existing layers.
The class defined by the requester must implement certain methods shown as follows:
class ANNModel(Functional):
def __init__(self, optimizer):
super().__init__()
self.custom_layer1 = CustomLayer1()
self.custom_layer2 = CustomLayer2()
self.act = Activation('softmax')
self.initialize_params(optimizer)
def _forward_pass_call(self, input, training=True):
o = self.custom_layer1._forward_pass(input, training=training)
o = self.custom_layer2._forward_pass(o, training=training)
o = self.act._forward_pass(o)
return o
Note: The
initialize_params
method must be called in the__init__
method of the custom model class. This method initializes the parameters of the model and sets the optimizer for the model.
The requester can now define their training loop by using the batch_iterator
function from ravdl.v2.utils
module. This function takes the input and target data as arguments and returns a generator that yields a batch of data at each iteration.
Note that the _forward_pass()
and _backward_pass()
methods of the custom model class must be called in the training loop.
optimizer = Adam()
model = ANNModel(optimizer)
epochs = 100
for i in range(epochs):
for X_batch, y_batch in batch_iterator(X, y, batch_size=25):
X_t = R.t(X_batch)
y_t = R.t(y_batch)
out = model._forward_pass(X_t)
loss = R.square_loss(y_t, out)
model._backward_pass(loss)
out = model._forward_pass(R.t(X_test), training=False)
out.persist_op(name="prediction")
Note: The
_forward_pass()
method takes an additional argumenttraining
which is set toTrue
by default. This argument is used to determine whether the model is in training mode or not. The_forward_pass()
method must be called withtraining=False
when making predictions.
Complete example scripts of Functional Model can be found here:
from ravdl.v2 import NeuralNetwork
from ravdl.v2.optimizers import RMSprop, Adam
from ravdl.v2.layers import Activation, Dense, BatchNormalization1D, Dropout, Conv2D, Flatten, MaxPooling2D
model = NeuralNetwork(optimizer=RMSprop(),loss='SquareLoss')
model.add(Dense(n_hidden, input_shape=(n_features,)))
model.add(BatchNormalization1D())
model.add(Dense(30))
model.add(Dropout(0.9))
model.add(Dense(3))
model.add(Activation('softmax'))
You can view the summary of model in tabular format...
model.summary()
train_err = model.fit(X, y, n_epochs=5, batch_size=25)
By default, the batch losses for each epoch are made to persist in the Ravenverse and can be retrieved later on as and when the computations of those losses are completed.
If required, model inference can be tested by using the predict
function. The output is stored as an Op and should be made to persist in order to view it later on.
y_test_pred = model.predict(X_test)
y_test_pred.persist_op(name='test_prediction')
RavDL now supports direct loading and training of Pytorch models on the Ravenverse using our powerful distribution strategies.
Note: The Requester must first convert their Pytorch model to TorchScript model using the
torch.jit.script
function. We recommend you to refer to the Pytorch documentation for more information on TorchScript.
The Model Op takes a TorchScript model file (.pt
) as an argument and creates an Op that can be used to train the model on the Ravenverse.
import ravop as R
model_op = R.model('test_model.pt')
This model op can now be loaded directly into a RavDL model and further used to define the training loop.
from ravdl.v2 import Pytorch_Model
from ravdl.v2.optimizers import Adam
optimizer = Adam()
model = Pytorch_Model(model_op=model_op)
model.initialize(optimizer)
for i in range(epochs):
for X_batch, y_batch in batch_iterator(X, y, batch_size=256):
X_t = R.t(X_batch.astype(np.float32))
y_t = R.t(y_batch.astype(np.float32))
out = model._forward_pass(X_t)
loss = R.square_loss(y_t, out)
# Set step = True whenever optimizer step needs to be called after backprop (defaults to True).
model._backward_pass(loss, step = True)
Apart from persisting output Ops, the trained model can also be saved on the Ravenverse. This can be done by calling the save_model
function.
model.save_model(name='my_net')
The saved model can be fetched from the Ravenverse by calling the fetch_persisting_op
function.
my_net = R.fetch_persisting_op(op_name='my_net')
Complete example scripts of TorchScript Model loading and training can be found here:
Once the model has been defined (Functional/Sequential) and all required Ops for the Graph have been defined, then Graph can be activated and made ready for execution as follows:
R.activate()
Here is what should happen on activating the Graph (the script executed below is available here):
Once the Graph has been activated, no more Ops can be added to it. The Graph is now ready for execution. Once Ravop has been initialized with the token, the graph can be executed and tracked as follows:
R.execute()
R.track_progress()
Here is what should happen on executing the Graph (the script executed below is available here):
As mentioned above, the batch losses for each epoch can be retrieved as and when they have been computed. The entire Graph need not be computed in order to view a persisting Op that has been computed. Any other Ops that have been made to persist, such as y_test_pred
in the example above, can be retrieved as well.
batch_loss = R.fetch_persisting_op(op_name="training_loss_epoch_{}_batch_{}".format(epoch_no, batch_no))
print("training_loss_epoch_1_batch_1: ", batch_loss)
y_test_pred = R.fetch_persisting_op(op_name="test_prediction")
print("Test prediction: ", y_test_pred)
Note: The Ops that have been fetched are of type torch.Tensor.
This project is licensed under the MIT License - see the LICENSE file for details