Ravnest introduces a novel asynchronous parallel training approach that combines the best aspects of data and model parallelism. This method enables the distributed training of complex deep learning models across large datasets, utilizing clusters of heterogeneous consumer-grade PCs connected via the internet. Designed with scalability and performance as key objectives, Ravnest seeks to empower researchers and machine learning practitioners. It simplifies the development and deployment of deep learning models, paving the way for innovative research and practical real-world applications.
Documentation: https://ravnest.readthedocs.io
Research Paper: https://arxiv.org/abs/2401.01728
pip install git+https://github.com/ravenprotocol/ravnest.git
Clone the Repository:
git clone https://github.com/ravenprotocol/ravnest.git
Generate the submodel files:
python cluster_formation.py
NOTE: Uncomment the correct lines in
cluster_formation.py
for CNN/ResNet-50/Inception-V3/GPT-Sorter/BERT models.
Execution of Clients (in 3 terminals) for CNN:
Create 3 copies of the provider.py
file inside examples/cnn/
folder. Rename these files as provider_0.py
, provider_1.py
and provider_2.py
. In each of these files, set the name
parameter of Node()
object to 'node_0'
, 'node_1'
and 'node_2'
.
python examples/cnn/provider_0.py
python examples/cnn/provider_1.py
python examples/cnn/provider_2.py
NOTE: If you have installed Ravnest via Pip, you will have to delete the entire
ravnest
subfolder in your cloned directory so that your scripts utilize methods and classes pointing to the pip installed library.
If you have found Ravnest or its foundational components and algorithms to be beneficial in your research, please consider citing the following source:
@misc{menon2024ravnest,
title={Ravnest: Decentralized Asynchronous Training on Heterogeneous Devices},
author={Anirudh Rajiv Menon and Unnikrishnan Menon and Kailash Ahirwar},
year={2024},
eprint={2401.01728},
archivePrefix={arXiv},
primaryClass={cs.LG}
}