Add helper for replicating model across multiple hosts #607

nikitakit · 2020-11-09T02:26:36Z

nikitakit
Nov 9, 2020

Currently optimizer.replicate() will replicate a model to all devices on the current host, but flax doesn't provide any means to replicate model parameters in a multi-host setting.

For multi-host training to work, parameters need to be initialized identically across all hosts. This requires discipline from the programmer to use the same random seed on all hosts, and to avoid using unseeded randomness (like numpy randomness) when initializing parameters.

It would be helpful if there were a standard command that takes a copy of the model from one host and replicates it to all others.

(For anyone wondering why I ran into this: I'm loading pre-trained BERT checkpoints saved with tensorflow or pytorch, except that I also need to add a task-specific head on top. Since most of the parameter loading happens outside of flax and deals with numpy arrays, it felt natural to just add code in the same place that calls numpy.random() to initialize the classifier head)

avital · 2020-11-12T12:49:53Z

avital
Nov 12, 2020

Hi @nikitakit, good to have you here :)

optimizer.replicated() is deprecated, have you taken a look at the patterns we have in our multi-host examples (such as ImageNet?)

0 replies

avital · 2020-11-12T12:52:19Z

avital
Nov 12, 2020

(I'll convert this to a discussion, if we decide to add any new features we can file an issue afterwards.)

0 replies

jheek · 2020-11-12T13:20:44Z

jheek
Nov 12, 2020
Maintainer

You could sync the paramaters across devices and make the first device the leading one:

@partial(jax.pmap, axis_name='broadcast')
def broadcast(xs):
  i = jax.lax.axis_index('broadcast')
  def inner(x):
    return lax.psum(jnp.where(i == 0, x, jnp.zeros_like(x)), 'broadcast')
  return jax.tree_map(inner, xs)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add helper for replicating model across multiple hosts #607

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Add helper for replicating model across multiple hosts #607

nikitakit Nov 9, 2020

Replies: 3 comments

avital Nov 12, 2020

avital Nov 12, 2020

jheek Nov 12, 2020 Maintainer

nikitakit
Nov 9, 2020

avital
Nov 12, 2020

avital
Nov 12, 2020

jheek
Nov 12, 2020
Maintainer