optimizers.sgd returns inconsistent results #14687

toan-vt · 2023-02-26T01:44:11Z

toan-vt
Feb 26, 2023

I'm working with optimizers in jax.example_libraries

I try with two different learning rates (i.e., 0.01 and 0.005)

from jax.example_libraries import optimizers

_, opt_update_full, _  = optimizers.sgd(0.01)
_, opt_update_half, _  = optimizers.sgd(0.005)

Both of them update for the same gradient (i.e., grad_1). Then, I calculate the sum of absolute difference of all parameters. However, the absolute difference when using learning rate of 0.01 is not twice of the absolute difference of learning rate 0.005

@jit
def update(_, i, opt_state, batch):
  params = get_params(opt_state)
  grad_1 = grad(loss)(params, batch)  
  
  params_half = get_params(opt_update_half(i, grad_1, opt_state))
  params_full = get_params(opt_update_full(i, grad_1, opt_state))

  non_empty_params, _ = tree_flatten(params)
  nonempty_params_half, _ = tree_flatten(params_half)
  nonempty_params_full, _ = tree_flatten(params_full)

  flat_params =  jnp.asarray([jnp.linalg.norm(neg.ravel()) for neg in non_empty_params])
  flat_params_half =  jnp.asarray([jnp.linalg.norm(neg.ravel()) for neg in nonempty_params_half])
  flat_params_full =  jnp.asarray([jnp.linalg.norm(neg.ravel()) for neg in nonempty_params_full])

  total_diff_1 = jnp.sum(jnp.abs(flat_params_half - flat_params))
  total_diff_2 = jnp.sum(jnp.abs(flat_params_full - flat_params_half))
  total_diff_3 = jnp.sum(jnp.abs(flat_params_full - flat_params))

  jax.debug.print("\n\t |flat_params_half -  flat_params|      {}", total_diff_1)
  jax.debug.print("\t |flat_params_full - flat_params_half|  {}", total_diff_2)
  jax.debug.print("\t |flat_params_full - flat_params|       {}", total_diff_3)

Here is some results.

Thank you so much for your help. Appreciate it!

Answered by mattjj

Feb 26, 2023

Thanks for the question! Could you share a fully runnable repro?

As you can see from the code, the sgd update isn't doing much.

It's hard to guess without having a repro, but it could just be standard precision loss from floating point accumulation. If so, you may be able to reorder the computations to be more stable, but an easy way to check would be to set jax.config.update('jax_enable_x64', True) at the top of your file and see if the answers change.

View full answer

mattjj · 2023-02-26T19:18:12Z

mattjj
Feb 26, 2023
Maintainer

Thanks for the question! Could you share a fully runnable repro?

As you can see from the code, the sgd update isn't doing much.

It's hard to guess without having a repro, but it could just be standard precision loss from floating point accumulation. If so, you may be able to reorder the computations to be more stable, but an easy way to check would be to set jax.config.update('jax_enable_x64', True) at the top of your file and see if the answers change.

3 replies

toan-vt Feb 27, 2023
Author

Hi Matthew,

Thanks so much for spending time to help me.

I have tried adding "jax_enable_x64", but the issue remains unchanged.
I attach two files that can be used to run the test. Please change it to .py and put them into the same folder.

datasets.txt
jax-sgd-test.txt

mattjj Feb 27, 2023
Maintainer

Thanks for the repro.

I just noticed that the code doesn't match the description: it's not computing the sum of the absolute difference of all the parameters, because jnp.linalg.norm is taking the l2-norm of each parameter array. That is, we're computing the sum of absolute differences of (essentially) block-wise l2 norms of the parameters.

Maybe you meant to do this?

    flat_params =  jnp.concatenate([neg.ravel() for neg in non_empty_params])
    flat_params_half =  jnp.concatenate([neg.ravel() for neg in nonempty_params_half])
    flat_params_full =  jnp.concatenate([neg.ravel() for neg in nonempty_params_full])
    total_diff_1 = jnp.sum(jnp.abs(flat_params_half - flat_params))
    total_diff_2 = jnp.sum(jnp.abs(flat_params_full - flat_params_half))
    total_diff_3 = jnp.sum(jnp.abs(flat_params_full - flat_params))

With that, in float32 we get results like

         |flat_params_half -  flat_params|      29.795917510986328
         |flat_params_full - flat_params_half|  29.79592514038086
         |flat_params_full - flat_params|       59.591835021972656

toan-vt Feb 27, 2023
Author

Hi Matthew,

Got it. Thank you so much for your help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimizers.sgd returns inconsistent results #14687

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

optimizers.sgd returns inconsistent results #14687

toan-vt Feb 26, 2023

Replies: 1 comment · 3 replies

mattjj Feb 26, 2023 Maintainer

toan-vt Feb 27, 2023 Author

mattjj Feb 27, 2023 Maintainer

toan-vt Feb 27, 2023 Author

toan-vt
Feb 26, 2023

Replies: 1 comment 3 replies

mattjj
Feb 26, 2023
Maintainer

toan-vt Feb 27, 2023
Author

mattjj Feb 27, 2023
Maintainer

toan-vt Feb 27, 2023
Author