SlowMo (BMUF) support for PyTorch distributed training #1553

albertz · 2024-06-26T08:01:57Z

This is for the parameter averaging method in distributed training. The SlowMo method adds an additional momentum which is used for the outer loop updates (i.e. after param averaging).

Wang et al., “SlowMo.”, ICLR 2020. Arxiv, OpenReview.

Original fairscale code. Code also in Fairseq.

The method is actually conceptually the same as BMUF. Only some of the experiments in the SlowMo paper go a bit beyond that.

Chen and Huo, “Scalable Training of Deep Learning Machines by Incremental Block Training with Intra-Block Parallel Optimization and Blockwise Model-Update Filtering.” (BMUF), ICASSP 2016

albertz added PyTorch MultiGPU labels Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SlowMo (BMUF) support for PyTorch distributed training #1553

SlowMo (BMUF) support for PyTorch distributed training #1553

albertz commented Jun 26, 2024

SlowMo (BMUF) support for PyTorch distributed training #1553

SlowMo (BMUF) support for PyTorch distributed training #1553

Comments

albertz commented Jun 26, 2024