use nn.vjp to get gradients wrt inputs instead of params #2176

luweizheng · 2022-06-07T11:54:00Z

luweizheng
Jun 7, 2022

Hi all,

I want to get gradients wrt to model input. There is a thread discussing how to get it using pure jax function. And I have already known how to do it.

Now I want to use the lifted version of nn.vjp inside the nn.Module to get the gradient.
Here is my code:

class FFNN(nn.Module):
    features: Sequence[int]

    @nn.compact
    def __call__(self, t, x, train: bool = True):
        x = jnp.hstack((t, x))
        for idx, out_feat in enumerate(self.features):
            x = nn.Dense(features=out_feat)(x)
            if idx != len(self.features) - 1:
                x = nn.relu(x)
        return x

class FBSDENN(nn.Module):
  @nn.compact
  def __call__(self, t, x):
    mlp = FFNN(features=1 * [10] + [1])
    (u, bwd) = nn.vjp(lambda mdl, t, x: mdl(t, x), mlp, t, x)
    params_grad, t_grad, x_grad = bwd(jnp.ones(u.shape))
    return x, x_grad

There are two input (t, x) of my model. I want to get the gradient wrt x.

I got the following error:

File ~/.conda/envs/jax/lib/python3.10/site-packages/flax/core/lift.py:339, in _bwd_wrapper(treedef, bwd_fn, tangent)
    [338](file:///home/u20200002/.conda/envs/jax/lib/python3.10/site-packages/flax/core/lift.py?line=337) def _bwd_wrapper(treedef, bwd_fn, tangent):
--> [339](file:///home/u20200002/.conda/envs/jax/lib/python3.10/site-packages/flax/core/lift.py?line=338)   vars_grad, inputs_grad = bwd_fn(tangent)
    [340](file:///home/u20200002/.conda/envs/jax/lib/python3.10/site-packages/flax/core/lift.py?line=339)   vars_grad = treedef.unflatten(vars_grad)
    [341](file:///home/u20200002/.conda/envs/jax/lib/python3.10/site-packages/flax/core/lift.py?line=340)   return inputs_grad, vars_grad

ValueError: too many values to unpack (expected 2)

I do not want the gradient wrt params. Should I use vjp_variables? How to use this option?

Answered by cgarciae

Jun 7, 2022

Two suggestions:

If you don't want the jacobian wrt t, pass t as a capture:

(u, bwd) = nn.vjp(lambda mdl, x: mdl(t, x), mlp, x)

vjp_variables='params' by fault, try setting it to False.

View full answer

cgarciae · 2022-06-07T22:29:12Z

cgarciae
Jun 7, 2022
Maintainer

Two suggestions:

If you don't want the jacobian wrt t, pass t as a capture:

(u, bwd) = nn.vjp(lambda mdl, x: mdl(t, x), mlp, x)

vjp_variables='params' by fault, try setting it to False.

4 replies

cgarciae Jun 7, 2022
Maintainer

If you don't need the jacobian wrt any of the variables I think using jax.vjp would be simpler:

(u, bwd) = jax.vjp(lambda x: mdl(t, x), x)

luweizheng Jun 8, 2022
Author

Two suggestions:

If you don't want the jacobian wrt t, pass t as a capture:
(u, bwd) = nn.vjp(lambda mdl, x: mdl(t, x), mlp, x)
vjp_variables='params' by fault, try setting it to False.

Thanks a lot! Both of these methods work!

jax.vjp is clean and simple.

Hope that there are more docs or examples on vjp_variables .

luweizheng Jun 16, 2022
Author

If you don't need the jacobian wrt any of the variables I think using jax.vjp would be simpler:
(u, bwd) = jax.vjp(lambda x: mdl(t, x), x)

Hi @cgarciae , just want to know more.

We can use jax.vjp or jax's transformation here, it means that lambda x: mdl(t, x) is a pure function. mdl here is an instance ofnn.Module and mdl(t, x) is the mdl.__call__ function.

Here we can use jax.vjp because we do not need to modify mdl's params. If we vmap, params may have an extra axis, so we should take care by using lifted nn.vmap.

Am I right?

cgarciae Jun 16, 2022
Maintainer

Hey @luweizheng,

Now that I think about it you should always use nn.vjp since you still have to lift the variables to avoid tracer errors, concretely, if the module is initializing or mutating some variables inside vjp then they need to be properly handled. Its curious that jax.vjp worked at all since Flax has some checks in place to detect when its being ran inside an un-lifted jax transformation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use nn.vjp to get gradients wrt inputs instead of params #2176

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

use nn.vjp to get gradients wrt inputs instead of params #2176

luweizheng Jun 7, 2022

Replies: 1 comment · 4 replies

cgarciae Jun 7, 2022 Maintainer

cgarciae Jun 7, 2022 Maintainer

luweizheng Jun 8, 2022 Author

luweizheng Jun 16, 2022 Author

cgarciae Jun 16, 2022 Maintainer

luweizheng
Jun 7, 2022

Replies: 1 comment 4 replies

cgarciae
Jun 7, 2022
Maintainer

cgarciae Jun 7, 2022
Maintainer

luweizheng Jun 8, 2022
Author

luweizheng Jun 16, 2022
Author

cgarciae Jun 16, 2022
Maintainer