Why doesn't the L2 loss sum over the errors? #193

brynhayder · 2021-10-06T16:56:31Z

brynhayder
Oct 6, 2021

Title is self explanatory. Why doesn't this have a sum in it (e.g. for multi-output regression)?

Answered by mkunesch

Hi! Thanks a lot for the question!

In my opinion, this has two main advantages:

Not every user needs the same reduction function. While it's easy to apply a reduction function to the optax loss, it wouldn't necessarily be easy to change a reduction function that's already applied. For example, if optax applied jnp.sum to the l2_loss a user who wants to take the argmin of the l2_loss couldn't use the optax implementation.
The computation might be distributed. The loss might be the result of a reduction over data on different devices. If optax used e.g. the common reduction jnp.mean in the loss, we would get a slightly wrong result if we also took a jax.lax.pmean on top of this. If no redu…

mkunesch · 2021-11-17T22:23:57Z

Hi! Thanks a lot for the question!

In my opinion, this has two main advantages:

Not every user needs the same reduction function. While it's easy to apply a reduction function to the optax loss, it wouldn't necessarily be easy to change a reduction function that's already applied. For example, if optax applied jnp.sum to the l2_loss a user who wants to take the argmin of the l2_loss couldn't use the optax implementation.
The computation might be distributed. The loss might be the result of a reduction over data on different devices. If optax used e.g. the common reduction jnp.mean in the loss, we would get a slightly wrong result if we also took a jax.lax.pmean on top of this. If no reduction is performed by default, we can just apply jax.lax.pmean directly.

There might of course be other advantages/considerations too, but these were the first ones I could think of!

Hope this helps!

0 replies