Gauss-Newton Vector Products, linearizing once #24940

Honza9723 · 2024-11-18T03:07:52Z

Honza9723
Nov 18, 2024

Dear All,

I would like to ask for advice, on how to accelerate the computation of GGN-vector products. I wrote what I believe should be a reasonably efficient implementation of a GGN-vector product using one jvp and one vjp. My understanding is that my code linearizes the function twice, once for jvp, and then again for vjp (although the whole code is JIT compiled, so maybe XLA is able to reuse all the jacobians under the hood).

However the autodial cookbook suggests that the GGN-vector product can be accelerated by linearizing the function just once (reusing function jacobians from jvp computation to calculate vjp). As far as I understand, I can use jax.linearize instead of jax.jvp, so I get an f_jvp function instead of a single jvp. So, I guess my question is, whether there is a simple way, how to transpose f_jvp output of jax.linearize into a f_vjp function equivalent to the output of jax.vjp.

Any advice would be super welcome!

Best,
Jan

bmaxdk · 2024-11-19T02:52:16Z

bmaxdk
Nov 19, 2024

Yes,you can linearize or manually implement it.

In this line of code:

_, JtV = jax.jvp(lambda th: E_jacob(x, th, rec, params), (theta_vec,), (theta_vec,))
_, e_vjp = jax.vjp(lambda th: E_jacob(x, th, rec, params), theta_vec)
Gv = e_vjp(HyJtV)[0]

change to

f_val, f_jvp = jax.linearize(lambda th: E_jacob(x, th, rec, params), theta_vec)
JtV = f_jvp(gradient)

def f_vjp(vec):
    return f_jvp(vec)  # Transposing JVP --> simulate VJP

Gv = f_vjp(HyJtV)

With this the Jacobian will be compute by jax.linearize is reused, saving computation and speeding up the code.

4 replies

Honza9723 Nov 19, 2024
Author

Hi. Thank you for an answer. Unfortunately this solution doesn't work for me, I received a following shape error: linearized function called on tangent values inconsistent with the original primal values: got ShapedArray(float64[95]) for primal aval ShapedArray(float64[349503]). Since my function does't have the same input and output dimension, the implied jacobian isn't square. I think that the f_vjp function your code defines does not perform transposition of jvp.

bmaxdk Nov 20, 2024

You are right as shown error above, directly applying to transpose the f_jvp output from jax.linearize to simulate a VJP leads to dimension mismatches and errors because f_jvp does not inherently provide the transposed operation needed for VJP.

You see the code you attached, you need a modification to avoid mismatch dimiension.

@jax.jit
def Sample_GGNGP_optimized(x, gradient, theta, params):
    # Unpack homotopy parameters
    weight_b = params[idx_h_b_wgh]
    weight_b_pre = params[idx_h_b_pre]

    # Flatten parameter vector
    theta_vec, rec = jax.flatten_util.ravel_pytree(theta)

    # Construct loss hessian
    hess_diagonal = 2 * jnp.concatenate([
        jnp.ones(shape=(N_cohort - 1,)),
        weight_b * jnp.ones(shape=(N_cohort - 1,)),
        weight_b_pre * jnp.ones(shape=(1,))
    ])

    # Define the function once
    def E_fun(th):
        return E_jacob(x, th, rec, params)

    # Compute J^T V using jvp
    _, JtV = jax.jvp(E_fun, (theta_vec,), (gradient,))

    # Multiply with Hessian
    HyJtV = hess_diagonal * JtV

    # Compute Gv using vjp
    _, e_vjp = jax.vjp(E_fun, theta_vec)
    Gv = e_vjp(HyJtV)[0]

    return Gv

Honza9723 Nov 20, 2024
Author

Your suggested code indeed works, however It doesn't provide any material speed-up vis-a-vis my original implementation

I suspect either of those two things might be happening

Both mine and yours implementation is 'optimal', i.e. XLA is able to reuse those jacobians under the hood despite our implementations make two separate calls to jax.jvp and jax.vjp.
Both mine and yours implementation is 'sub-optimal', because we don't employ jax.linearize and hence the code linearizes the function twice.

I am wondering, is there some simple way, how to transpose the output of jax.linearize, or at least check whether XLA-optimized code is reusing those jacobians?

bmaxdk Nov 21, 2024

I don't think jax.linearize has a direct way to get the VJP from the JVP, making it challenging to transpose its output for reuse with simple way.

Vjudith77 · 2024-11-19T03:09:20Z

Vjudith77
Nov 19, 2024

Thank you judith valenzuela El lun, 18 de nov de 2024 a la(s) 7:53 p. m., Howard ***@***.***> escribió: Yes,you can linearize or manually implement it. In this line of code: _, JtV = jax.jvp(lambda th: E_jacob(x, th, rec, params), (theta_vec,), (theta_vec,)) _, e_vjp = jax.vjp(lambda th: E_jacob(x, th, rec, params), theta_vec) Gv = e_vjp(HyJtV)[0] change to f_val, f_jvp = jax.linearize(lambda th: E_jacob(x, th, rec, params), theta_vec) JtV = f_jvp(gradient) def f_vjp(vec): return f_jvp(vec) # Transposing JVP --> simulate VJP Gv = f_vjp(HyJtV) With this the Jacobian will be compute by jax.linearize is reused, saving computation and speeding up the code. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gauss-Newton Vector Products, linearizing once #24940

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Gauss-Newton Vector Products, linearizing once #24940

Honza9723 Nov 18, 2024

Replies: 2 comments · 4 replies

bmaxdk Nov 19, 2024

Honza9723 Nov 19, 2024 Author

bmaxdk Nov 20, 2024

Honza9723 Nov 20, 2024 Author

bmaxdk Nov 21, 2024

Vjudith77 Nov 19, 2024

Honza9723
Nov 18, 2024

Replies: 2 comments 4 replies

bmaxdk
Nov 19, 2024

Honza9723 Nov 19, 2024
Author

Honza9723 Nov 20, 2024
Author

Vjudith77
Nov 19, 2024