Shallowspeed

A tiny POC implementation of distributed training for sequential deep learning models. Implemented using plain Numpy & mpi4py.

Currently implements:

Sequential models / deep MLPs, training using SGD.
Data parallel training with interleaved communication & computation, similar to PyTorch's DistributedDataParallel.
Pipeline parallel training:
- Naive schedule without interleaved stages.
- Gpipe schedule with interleaved FWD & interleaved BWD.
- (soon) PipeDream Flush schedule with additional inter-FWD & BWD interleaving.
Any combination of DP & PP algorithms.

Setup

conda env create
pip install -e .
# M1 Macs: conda install "libblas=*=*accelerate"
python download_dataset.py
pytest

Usage

# Sequential training
python train.py
# Data parallel distributed training
mpirun -n 4 python train.py --dp 4
# Pipeline parallel distributed training
mpirun -n 4 python train.py --pp 4 --schedule naive
# Data & pipeline parallel distributed training
mpirun -n 8 python train.py --dp 2 --pp 4 --schedule gpipe

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/assets		.github/assets
data		data
scripts		scripts
shallowspeed		shallowspeed
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
download_dataset.py		download_dataset.py
environment.yml		environment.yml
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shallowspeed

Setup

Usage

Internals

About

Languages

siboehm/ShallowSpeed

Folders and files

Latest commit

History

Repository files navigation

Shallowspeed

Setup

Usage

Internals

About

Topics

Resources

Stars

Watchers

Forks

Languages