You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is for the parameter averaging method in distributed training. The SlowMo method adds an additional momentum which is used for the outer loop updates (i.e. after param averaging).
The method is actually conceptually the same as BMUF. Only some of the experiments in the SlowMo paper go a bit beyond that.
Chen and Huo, “Scalable Training of Deep Learning Machines by Incremental Block Training with Intra-Block Parallel Optimization and Blockwise Model-Update Filtering.” (BMUF), ICASSP 2016
The text was updated successfully, but these errors were encountered:
This is for the parameter averaging method in distributed training. The SlowMo method adds an additional momentum which is used for the outer loop updates (i.e. after param averaging).
Original fairscale code. Code also in Fairseq.
The method is actually conceptually the same as BMUF. Only some of the experiments in the SlowMo paper go a bit beyond that.
The text was updated successfully, but these errors were encountered: