Efficient way to compute Jacobian in nested AD #963

facusapienza21 · 2024-10-01T00:27:31Z

HI! I was looking at the example in the docs about how to perform nested AD with Lux with Lux. The code in the documentation definitively works, and I have included a full example in Discurse with this for completeness. However, I noticed that when we evaluate the Jacobian this give us the Jacobian full with zeros, given than outputs of different inputs are not algebraically related (that is, no need to compute the full matrix).

For the following piece of code,

function loss(model, ps, st)

    # Compute predicions using model parameters
    X_pred = predict(ps)
    loss_emp = mean(abs2, Xₙ .- X_pred)

    # Make it a stateful layer
    smodel = StatefulLuxLayer{true}(U, ps, st)
    
    J = ForwardDiff.jacobian(smodel, Xₙ)
    loss_reg = 0.01f0 * abs2(norm(J))
    return loss_emp + loss_reg
end

this is how J looks like:

I think this is not very efficient, but I am also maybe missing something of how Lux internally manages this calculations. I tried computing the Jacobian/gradient for each individual input layer value, and this seems to be very inefficient. Other options on the top of my head include

Computing a VJP of this J times a vector with ones instead, to avoid the calculation of the zero entries. I think this is what Lux.jacobian_vector_preoduct is doing?
Add some sparsity pattern to the Jacobian calculation

Any suggestion here? I just would like to see this example using the best practices when using Lux.

Thank you so much! All this looks amazing.

The text was updated successfully, but these errors were encountered:

avik-pal · 2024-10-01T01:07:39Z

Use batched_jacobian https://lux.csail.mit.edu/stable/api/Lux/autodiff#Lux.batched_jacobian

avik-pal · 2024-10-01T01:08:29Z

Only thing to be aware of is that it gives you a 3D array (essentially a Uniform BlockDiagonal Matrix without storign the zeros)

facusapienza21 · 2024-10-01T01:26:59Z

Lovely, just like this I guess?

Thank you @avik-pal ! Would it make sense to update the documentation with this? Happy to open a PR.

avik-pal · 2024-10-01T01:53:04Z

yes, add a section after the full jacobian

facusapienza21 · 2024-10-02T01:14:55Z

Even with this improvement in how the calculation of the Jacobian is batched, I still observe that in this example the training of the UDE suffers drastically in terms of running time due to the regularization term. I was expecting the adjoint method on the numerical solver to be more expensive than the computation of the Jacobian of the NN wrt the input layer for <100 input values (+differentiation during the reverse pass). Does this makes sense for you @avik-pal ? Happy to provide with the example.

avik-pal · 2024-10-02T18:13:41Z

I was expecting the adjoint method on the numerical solver to be more expensive than the computation of the Jacobian of the NN wrt the input layer for <100 input values (+differentiation during the reverse pass).

That is not generally true. Just computing the jacobian will take ~ (100 / (batch_size * chunksize)) JVPs, and differentiating that would take the same number of JVPs over VJPs. Adjoint method is roughly bottlenecked by N VJPs where N is the number of (backward) timesteps. Unless the ODE is extremely expensive to solve (and similarly the adjoint is expensive) the former would be more expensive.

Generally for regularization, I would recommend using some form of a stochastic approximator for that using JVPs or VJPs

avik-pal · 2024-10-07T17:30:17Z

Since there is nothing actionable here, I am closing this. Feel free to post any questions at https://github.com/orgs/LuxDL/discussions, and I can take a look at them.

facusapienza21 mentioned this issue Oct 1, 2024

Added to Nested AD example how to use batched_jacobian #964

Merged

avik-pal closed this as completed Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient way to compute Jacobian in nested AD #963

Efficient way to compute Jacobian in nested AD #963

facusapienza21 commented Oct 1, 2024

avik-pal commented Oct 1, 2024

avik-pal commented Oct 1, 2024 •

edited

Loading

facusapienza21 commented Oct 1, 2024

avik-pal commented Oct 1, 2024

facusapienza21 commented Oct 2, 2024

avik-pal commented Oct 2, 2024

avik-pal commented Oct 7, 2024

Efficient way to compute Jacobian in nested AD #963

Efficient way to compute Jacobian in nested AD #963

Comments

facusapienza21 commented Oct 1, 2024

avik-pal commented Oct 1, 2024

avik-pal commented Oct 1, 2024 • edited Loading

facusapienza21 commented Oct 1, 2024

avik-pal commented Oct 1, 2024

facusapienza21 commented Oct 2, 2024

avik-pal commented Oct 2, 2024

avik-pal commented Oct 7, 2024

avik-pal commented Oct 1, 2024 •

edited

Loading