sparsity regularisation #235

renjithravindran · 2021-11-19T04:16:25Z

renjithravindran
Nov 19, 2021

Hi, how can I add an L1 regularization to a subset of parameters?
Thanks.

Nov 19, 2021

Hi! Thanks a lot for the question!

The optax.masked wrapper can be used to transform only a subset of parameters using a masking function. The docstring of optax.masked has an example with L2 regularisation (using add_decayed_weights). In the case of weight decay using a mask is so common that add_decayed_weights has a mask option, which uses optax.masked under the hood.

As far as I am aware, optax currently does not have a gradient transformation for L1 regularisation but this should be easy to implement by mirroring what happens in the L2 case (optax.add_decayed_weights). We should definitely add the L1 functionality too though; would you be keen to implement it and file a PR? No worrie…

View full answer

mkunesch · 2021-11-19T13:14:36Z

mkunesch
Nov 19, 2021
Maintainer

Hi! Thanks a lot for the question!

The optax.masked wrapper can be used to transform only a subset of parameters using a masking function. The docstring of optax.masked has an example with L2 regularisation (using add_decayed_weights). In the case of weight decay using a mask is so common that add_decayed_weights has a mask option, which uses optax.masked under the hood.

As far as I am aware, optax currently does not have a gradient transformation for L1 regularisation but this should be easy to implement by mirroring what happens in the L2 case (optax.add_decayed_weights). We should definitely add the L1 functionality too though; would you be keen to implement it and file a PR? No worries if not, I can also add it to our todo list or invite contributions by filing an issue.

I hope this helps! Let me know if you have any questions!

8 replies

renjithravindran Nov 20, 2021
Author

Cool! however, here are some early impressions, certainly uninformed...
Since, in Jax, things like regularization are considered as gradient transformations that need to be chained,
i suppose it becomes more difficult to add a non-standard regularization.
Elsewhere (tf/torch), to add a custom regularization one may simply write up a new loss function and have the autograd do the rest.
With Jax i suppose one will have to do some math to get the right gradient transform for the regularizer.
Is it possible to deviate from the Jax way of doing things and have a loss function with the regularizer included?

Going forward I will also need to consider a regularization scheme proposed here
Please let me know your thoughts on this.
Thanks!

mkunesch Nov 21, 2021
Maintainer

Hi! It is also possible in jax to include the regularisation in the loss function and let autodiff handle it; I'm often doing this myself even for simple losses. Optax just provides an additional way of doing it directly in the optimizer. The choice of which approach is better depends on taste and the particular application.

So yes, you can just add an l1 term to your loss directly (e.g. using functions from jax.tree_util if you have a tree of parameters and haiku.data_structures.filter if you want to calculate the loss for only a subset of the parameters), but I also think we should provide the gradient transformation alternative similar to add_decayed_weights in optax.

renjithravindran Nov 21, 2021
Author

Great! And of course it is good to have the L1 regularization readily available in optax, though i suppose it may not have big demand in contemporary DL space. Orthogonality regularization is another useful one. I am working on a niche application of tensor factorization, though I consider it fundamental, to NLP. Stuff like tensorly doesn't seem to cut it, so decided to try my luck with SGD, sparsity and other regularizations.

Currently I am trying out optax to get SVD working. It is only then I will move to tensor factorization and sparsity...
So it might take some time before i get to it. It seems I am liking the jax way, and is happy to contribute what ever little i can.
Thanks a lot for your time.

mkunesch Nov 26, 2021
Maintainer

Cool, that all sounds great! I hope it goes well - let me know if you have any questions, comments, etc in the process.

So it might take some time before i get to it.

No worries! I think I'll close the issue #237 I opened in the meantime so that we don't have a work-in-progress issue sitting around for too long but feel free to reopen when you start working on the sparsity regularisation.

Thanks a lot!

renjithravindran Nov 27, 2021
Author

Yep that is fine.
btw just thinking ahead... do you think it would be possible to do hogwild! style async SGD with jax?
hogwild! would need mutable and shared param arrays, so I suppose it is not in line with jax principles?

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sparsity regularisation #235

{{title}}

Replies: 1 comment 8 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

sparsity regularisation #235

renjithravindran Nov 19, 2021

Replies: 1 comment · 8 replies

mkunesch Nov 19, 2021 Maintainer

renjithravindran Nov 20, 2021 Author

mkunesch Nov 21, 2021 Maintainer

renjithravindran Nov 21, 2021 Author

mkunesch Nov 26, 2021 Maintainer

renjithravindran Nov 27, 2021 Author

renjithravindran
Nov 19, 2021

Replies: 1 comment 8 replies

mkunesch
Nov 19, 2021
Maintainer

renjithravindran Nov 20, 2021
Author

mkunesch Nov 21, 2021
Maintainer

renjithravindran Nov 21, 2021
Author

mkunesch Nov 26, 2021
Maintainer

renjithravindran Nov 27, 2021
Author