Freezing a subset of the parameters using Flax and optax.masked #167

matthias-wright · 2021-07-30T10:32:25Z

matthias-wright
Jul 30, 2021

Hi all,

I am trying to freeze a subset of the parameters during optimization using Flax and optax.masked.

Consider a simple autoencoder consisting of an encoder and a decoder, each with two conv layers.
The params pytree might look something like this:

 params
   encoder
     Conv_0
       kernel
       bias
     Conv_1
       kernel
       bias
   decoder
     Conv_0
       kernel
       bias
     Conv_1
       kernel
       bias

Now lets assume we want to freeze the encoder parameters.
According to the documentation, this can be achieved using optax.masked. The mask should be "a PyTree with same structure as (or a prefix of) the params PyTree, or a Callable that returns such a pytree given the params/updates."

The mask might looks as follows:

 params
   encoder False
   decoder
     Conv_0
       kernel True
       bias True
     Conv_1
       kernel True
       bias True

Then the optimizer is defined as follows:

tx = optax.masked(optax.adam(learning_rate=1.0), mask)

However, when I optimize the parameters, the encoder parameters will still get updated. No error messages are thrown.

Can someone tell me what I am missing?

I created a Colab with a reproducible example.

Thanks!

Answered by pharringtonp19

Aug 13, 2021

@matthias-wright I believe this is addressed by this issue here. As explained there, optax.mask essentially "zeros-out" the gradient transformation (i.e the gradients are not processed by the optimizer). The net result is that when we mask parameters, we're not transforming the gradient before we apply the update to our parameters and so the gradient becomes the update. The solution, as explained in the above mention issue, is actually not to use optax.mask but instead to use optax.multi_transform where the gradients are "zeroed-out". As a bounus -- if the training loop is jit-ed, the zero-ed out gradients are not even computed!

View full answer

pharringtonp19 · 2021-08-13T11:05:26Z

pharringtonp19
Aug 13, 2021

@matthias-wright I believe this is addressed by this issue here. As explained there, optax.mask essentially "zeros-out" the gradient transformation (i.e the gradients are not processed by the optimizer). The net result is that when we mask parameters, we're not transforming the gradient before we apply the update to our parameters and so the gradient becomes the update. The solution, as explained in the above mention issue, is actually not to use optax.mask but instead to use optax.multi_transform where the gradients are "zeroed-out". As a bounus -- if the training loop is jit-ed, the zero-ed out gradients are not even computed!

1 reply

matthias-wright Aug 13, 2021
Author

@pharringtonp19 Thanks for the help!
I updated the Colab if anyone is interested in the solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Freezing a subset of the parameters using Flax and optax.masked #167

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Freezing a subset of the parameters using Flax and optax.masked #167

matthias-wright Jul 30, 2021

Replies: 1 comment · 1 reply

pharringtonp19 Aug 13, 2021

matthias-wright Aug 13, 2021 Author

matthias-wright
Jul 30, 2021

Replies: 1 comment 1 reply

pharringtonp19
Aug 13, 2021

matthias-wright Aug 13, 2021
Author