Usability Test Suite: Improving Usability in Flax #1619

cgarciae · 2021-10-15T20:12:25Z

cgarciae
Oct 15, 2021
Maintainer

Hey Flax team! I had a chat with @avital a while ago and he invited me to post about things that I believe are usability issues in Flax so here it is.

In this discussion I will be comparing Flax with the new family of Pytree-based Module systems that includes Treex and Equinox since they seem to improve usability while being roughly as powerful (there are known edge cases though). Please feel free to suggests edits, I writing this in "vanilla" Flax and might not know more advanced mechanics.

Goal

Figure out if there is a way for Flax to simplify some of the presented cases.

Calling Modules

From a user perspective this is probably the most noticeable difference between Flax and traditional module systems like Pytorch and Keras.

Normal Call

Flax
When calling the top module in Flax using apply you have a lot of control at the cost of being significantly more verbose. You also get an asymmetry between calling modules inside other modules and when calling them on the outside.

class MyModule(Module):
    def __call__(self, x):
        return self.submodule(x) # inside you make normal calls

module = MyModule()
variables = module.init(...)
...
# outside you use apply
y, updates = module.apply(
    variables,
    *args,
    mutable=["batch_stats"],
    rngs={"dropout": key},
    **kwargs,
)
variables = variables.copy(updates)

Trees
Pytree Modules tend to behave like regular python objects, since they contain their own parameters and can be jitted so they generally don't need apply-like functions and rely on __call__ directly.

class MyModule(Module):
    def __call__(self, x):
        return self.submodule(x) # inside you make normal calls

module = MyModule()
module = module.init(...)
...
# outside you make normal calls
y = module(*args, **kwargs)

Calling Methods

This example is the same as the previous but the methods some_method and another_method will be called instead of the __call__ method.

Flax
To achieve this in Flax you use the method argument in apply which lets you call a method, the tradeoffs are the same as the previous example.

Note: I've never done this but I assume methods are called normally when inside other modules.

class MyModule(Module):
    def some_method(self, x):
        return self.submodule.another_method(x) # inside you call the method normally

module = MyModule()
variables = module.init(...)
...
# outside you use apply
y, updates = module.apply(
    variables,
    *args,
    mutable=["batch_stats"],
    rngs={"dropout": key},
    method=MyModule.some_method, # and specify the method argument
    **kwargs,
)
variables = variables.copy(updates)

Trees
As in the previous example, in Pytree Modules you just call the method directly.

class MyModule(Module):
    def some_method(self, x):
        return self.submodule.another_method(x) # inside you call the method normally

module = MyModule()
module = module.init(...)
...
# outside you call the method normally
y = module.some_method(*args, **kwargs)

Transfering Parameters

Transferring parameters from pretrained modules in Flax is one of the areas that will probably cause more friction for new users coming from Pytorch or Keras as having a parameter structure (variables) separate from the computational structure (the module) can be both a blessing and a curse.

Using pretrained modules

This first use use cases is about performing the typical Transfer Learning task of loading a pretrained model and fine tuning it with an added linear classifier on top.

Flax
The tricky thing in Flax is inserting the pretrained parameters into the correct place on the new parameter structure as names will matter here and have to be known in advance.

class MyModule(Module):
    pretrained_module: PretrainedModule
    linear: Linear

    def __call__(self, x):
        x = self.pretrained_module(x)
        x = self.linear(x)
        return x

pretrained_variables, pretrained_module = load_pretrained(...)

module = MyModule(pretrained_module)
variables = module.init(...).unfreeze()

variables["params"]["PretrainedModule_0"] = pretrained_variables["params"]
variables["batch_stats"]["PretrainedModule_0"] = pretrained_variables["batch_stats"]
variables["cache"]["PretrainedModule_0"] = pretrained_variables["cache"]

Trees
Pytree Module don't require anything special, it works like in Pytorch / Keras.

class MyModule(Module):
    pretrained_module: PretrainedModule
    linear: Linear

    def __call__(self, x):
        x = self.pretrained_module(x)
        x = self.linear(x)
        return x

pretrained_module = load_pretrained(...)

module = MyModule(pretrained_module)
module = module.init(...)

Loading a pretrained module inside `init`

This is a continuation of the previous but does loading step inside the new module, the motivation for this is that sometimes the loading code is abstracted away from the user, Keras "applications" tend to do something like this.

Flax
I don't know if this is possible in Flax.

Trees
For Pytree Module the only thing that changes from the previous case is that the pretrained module is not passed as a parameter to the constructor but instead loaded and assigned within it.

class MyModule(Module):
    pretrained_module: PretrainedModule
    linear: Linear
    
    def __init__(self):
        self.pretrained = load_pretrained(...)
        ...

    def __call__(self, x):
        x = self.pretrained_module(x)
        x = self.linear(x)
        return x

module = MyModule()
module = module.init(...)

Extracting a submodule

Here we extract a submodule as its own "top-level" module.

Flax
The tricky thing in Flax again is getting the parameters into the structure of their new module, names matter here and there is no editor support to get this right.

class VAE(Module):
    encoder: Encoder
    decoder: Decoder

    # lets pretend its this simple
    def __call__(self, x):
        z = self.encoder(x)
        y = self.decoder(z)
        return y

module = VAE()
variables = module.init(...)
...
decoder_variables = {
    "params": variables["params"]["Decoder_0"],
    "batch_stats": variables["batch_stats"]["Decoder_0"],
    ...
}
decoder = module.decoder
... # maybe you serialize it an load it in another process
y = decoder.apply(decoder_variables, z, ...)

Trees
Nothing special for Pytree Modules, just getting hold of the reference is enough.

class VAE(Module):
    encoder: Encoder
    decoder: Decoder

    # lets pretend its this simple
    def __call__(self, x):
        z = self.encoder(x)
        y = self.decoder(z)
        return y

module = VAE()
module = module.init(...)
...
decoder = module.decoder
... # maybe you serialize it an load it in another process
y = decoder(z)

Static State

Static state is often used to control, amongst other things, how Modules behave during training vs testing.

Flax
In Flax you do this by propagating static state all the way down from the jitted functions to the inner Modules.

class MyModule(Module):
    def __call__(self, x, training):
        x = self.dense(x)
        x = self.batch_norm(x, use_running_average=not training)
        x = self.dropout(x, deterministic=not training)
        x = jax.nn.relu(x)
        return x

module = MyModule()

@partial(jax.jit, static_argnums=(3,))
def predict(variables, x, key, training)
    y, updates = module.apply(
        variables,
        x, 
        training, 
        mutable=["batch_stats"],
        rngs={"dropout": key},
    )
    variables = variables.copy(updates)
    return y, variables

y_train, variables = predict(variables, x, key, True)
y_test, variables = predict(variables, x, key, False)

Trees
Pytree Modules can keep this state around in the static part of the Pytree so they can can keep track of it for the user, jax will recompile upon change so static_argnums is not needed.

class MyModule(Module):
    def __call__(self, x):
        x = self.linear(x)
        x = self.batch_norm(x)
        x = self.dropout(x)
        x = jax.nn.relu(x)
        return x

@jax.jit
def predict(module, x)
    y = module(x)
    return y, module

module = MyModule()
y_train, module = predict(module, x)

module = module.eval()
y_test, module = predict(module, x)

avital · 2021-11-17T05:06:03Z

avital
Nov 17, 2021

Hi @cgarciae -- thank you so much for the detailed note, and I apologize for the late reply. I wanted to jot down some (incomplete) parts of an answer, as a starting point, after chatting with @andsteing about this a couple of weeks ago.

Both @jheek and I, in the past, explored a design that's similar to the one you propose (the one in Treex). And I agree with you that simply calling methods seems cleaner than apply with all the kwargs. But the Flax design is built with a particular intention: to make it as hard as possible to use JAX in a non-functional way, which causes unexpected footguns.

The current Flax Linen design allows you to get access to stateful module instances (via Module.bind). So in some ways, many of your questions map to Flax if we ask: "what would happen if Module instances were made into pytrees, and we encourage the use of Module.bind instead of Module.apply"

Here's the general problem, though. Once you have stateful module instances as a first-class recommended practice, you hit into JAX rough edges and "footguns" -- you can accidentally make big mistakes and not even know them. Here is one example (pseudo-code):

instantiated_module = module.bind(variables)

@jax.jit
def make_prediction(inputs):
  return instantiated_module(inputs)

This example seems to work, and sort-of does. But it masks a big problem -- now the variables in instantiated_module are compiled in as constants into the compiled function, which means you have a huge JAXPR. Moreover, any future changes to the instantiated_module reference will not be detected and future calls to make_prediction will use the previous variable values)

Ultimately, there's a fundamental tension between Python object semantics and JAX jitted functions. Flax takes the approach of intentionally exposing only pure functions, so that there's less opportunity to accidentally write code that "seems like it should work but doesn't".

We should still be thinking about some of your comments above, such as the ergonomics to replacing parameters within a module subtree. Maybe we could think of a purely functional wrapper, like a "replace parameters" method of sorts. I haven't thought this through carefully.

0 replies

marcvanzee · 2021-11-17T12:51:08Z

marcvanzee
Nov 17, 2021
Maintainer

I just want to add that I really appreciate the effort you took to compare Flax and Trees in such a detailed way!

I agree with you that things in Trees look much simpler and "Pytorch" like, so in a sense more normal and how you would expect it. However I also agree with Avital that having pure functions is absolutely fundamental to make sure you interoperate with JAX well. If you don't do this, you will definitely run into problems that are hard to catch.

However, I do think you raise a few great points regarding the Flax API that can/should be improved. Here are two replies:

`init` and `apply`

The asymmetry between using Module.apply() for top-level modules and simply calling them for submodules. Additionally, the APIs for init and apply are different, while init under the hood simply calls apply, so that is somewhat confusing as well. I think this is a major confusion for new users we should address in one way or another. I don't have a good suggestion here, but we should think about it more.

Loading pretrained weights / Submodules / Variable dic tricks

First of all, in your example the pretrained model will have a name assigned to self, so it will be pretrained_module and not PretrainedModule_0. This is the same behavior as Pytorch.

But I agree that this is quite cumbersome indeed. We could probably simplify this with a simple utility function like this:

# path can be something like "submodule1/submodule2/pretrained_module"
def update_subtree(vars, update, path):
  vars = {'/'.join(k): v for k, v in traverse_util.flatten_dict(vars).items()}
  update = traverse_util.flatten_dict(update).items()
  update = [([k[0]] + path.split("/") + list(k[1:]), v) for k, v in update]
  vars.update({'/'.join(k): v for k, v in update})
  vars = traverse_util.unflatten_dict({tuple(k.split('/')): v for k, v in vars.items()})
  return vars

Now you can simply your example like this:

class MyModule(Module):
    pretrained_module: PretrainedModule
    linear: Linear

    def __call__(self, x):
        x = self.pretrained_module(x)
        x = self.linear(x)
        return x

pretrained_variables, pretrained_module = load_pretrained(...)

module = MyModule(pretrained_module)
variables = module.init(...).unfreeze()

variables = update_subtree(variables, pretrained_variables, "pretrained_module")

You can imagine similar utility functions for getting submodules, setting optimizer variables, etc.

0 replies

cgarciae · 2021-11-25T21:42:02Z

cgarciae
Nov 25, 2021
Maintainer Author

Hey @avital and @marcvanzee, thanks for your responses!

I've recently been thinking of an alternative method for the "Transfering Parameters" problem which might fit within Flax's model, as I don't know some of the internals so I hope this is not too crazy. If this idea makes sense I would be happy to port it to a FLIP.

The bind_init API

The core idea behind bind_init is: bind the variables to the module such that they can be leveraged only during init, thus transferring the parameters to the new structure. Some examples solving previous cases:

Using pretrained models

Performing parameter transfer this way is more streamlined as you don't actually have to know the names of where this will be use and the pattern should be easy to remember:

class MyModule(Module):
    pretrained_module: PretrainedModule
    linear: Linear

    def __call__(self, x):
        x = self.pretrained_module(x)
        x = self.linear(x)
        return x

pretrained_variables, pretrained_module = load_pretrained(...)

pretrained_module = pretrained_module.bind_init(pretrained_variables)

module = MyModule(pretrained_module)
variables = module.init(...)

Loading a pretrained module inside init

bind_init would actually enable this use-case that wasn't possible before, bind_init would just have to be called inside __post_init__ or if you are brave it could even be done in a default_factory e.g:

@dataclass
class MyModule(Module):
    pretrained_module: PretrainedModule
    linear: Linear
    
    def __post_init__(self):
        pretrained_variables, pretrained_module = load_pretrained(...)
        self.pretrained_module = pretrained_module.bind_init(pretrained_variables)
        ...

    def __call__(self, x):
        x = self.pretrained_module(x)
        x = self.linear(x)
        return x

module = MyModule()
variables = module.init(...)

Extracting a submodule

This one is a bit different because will will actually call bind_init on the parent module but call init a child module.

class VAE(Module):
    encoder: Encoder
    decoder: Decoder

    # lets pretend its this simple
    def __call__(self, x):
        z = self.encoder(x)
        y = self.decoder(z)
        return y

module = VAE()
variables = module.init(...)

# do some training and stuff
...

# get new structure
decoder = module.bind_init(variables).decoder
decoder_variables = decoder.init(...) # should use all the existing ones

... # maybe you serialize it an load it in another process
y = decoder.apply(decoder_variables, z, ...)

As this example shows it would actually be expected that init binding is applied recursively.

Notes

I am not sure (given how Flax is implemented), if bind_init would be able to only take variables or it would also have to take a sample input to run the forward computation in order to recursively bind all submodules as shown in the last example.

4 replies

avital Dec 16, 2021

Hi @cgarciae I'm finally starting to digest this. Two short questions come to mind, that'll help me and others think through this:

Does the pattern you're describing now work with Module.bind?
What is the motivation behind adding a new bind_init rather than making bind work (if it doesn't now)? Is it that if people start using Module.bind() then they will try to over-use it, leading to leaked tracers and closing over stateful objects? And that bind_init makes it harder to shoot yourself in the foot that way because it's more restricted?

cgarciae Dec 16, 2021
Maintainer Author

Hey @avital! I created this small example to test if bind currently supports this behaviour but it doesn't:

import jax
import numpy as np
from flax import linen as nn

key = jax.random.PRNGKey(0)
k1, k2 = jax.random.split(key)
x = np.random.uniform(size=(20, 3))

# @dataclass
class Parent(nn.Module):
    pretrained: nn.Module

    def __call__(self, x):
        return self.pretrained(x)

m1 = nn.Dense(10)
m1_variables = m1.init(k1, x)
m1 = m1.bind(m1_variables)

m2 = Parent(m1)
m2_variables = m2.init(k2, x)

# currently False
assert np.allclose(
    m1_variables["params"]["kernel"], 
    m2_variables["params"]["pretrained"]["kernel"]
)

Before doing this example it was unclear to me what bind should do in this case, it seems that it gets ignored, part of me was expecting it wouldn't be ignored but the bounded variables from m1_variables wouldn't appear in m2_variables as the would be treated as constants when self.pretrained(x) is called.

bind_init would be different from bind in two ways:

It wouldn't allow you to call the Module freely, it would just reuse the bounded variables during init instead of creating new ones.
It would only allow the user to bind variables but not rng state.

avital Jan 13, 2022

I finally got some time to look more into this. Yes, indeed the code as you described doesn't work as you'd expect (I should've realized this, as the first step of init is to clone the module and bind the clone, so whatever bound submodules were around get lost). A few thoughts:

We should consider adding an assertion to init so that if you try to clone a module with bound submodules you get an error rather than having them confusingly being ignored. (The same would be true about .apply -- this would be a breaking change but a good one).
To get something close to the user experience you describe, we'd have to change Module.init() (or add a new method) that keeps a copy of the variable dicts of bound submodules. I wonder if that would lead to other problems.
Does vanilla bind() work for the "extracting submodule parameters" user journey? I think it does.

cgarciae Jan 13, 2022
Maintainer Author

To get something close to the user experience you describe, we'd have to change Module.init() (or add a new method) that keeps a copy of the variable dicts of bound submodules. I wonder if that would lead to other problems.

I was originally thinking that if problems could arise from having to hold the variables then maybe the separate bind_init method could help mitigate issues by being more restrictive.

Does vanilla bind() work for the "extracting submodule parameters" user journey? I think it does.

I think for all intents and purposes bind would do the job if we get the above example working 😀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usability Test Suite: Improving Usability in Flax #1619

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Usability Test Suite: Improving Usability in Flax #1619

cgarciae Oct 15, 2021 Maintainer

Goal

Calling Modules

Normal Call

Calling Methods

Transfering Parameters

Using pretrained modules

Loading a pretrained module inside __init__

Extracting a submodule

Static State

Replies: 3 comments · 4 replies

avital Nov 17, 2021

marcvanzee Nov 17, 2021 Maintainer

init and apply

Loading pretrained weights / Submodules / Variable dic tricks

cgarciae Nov 25, 2021 Maintainer Author

The bind_init API

Using pretrained models

Loading a pretrained module inside init

Extracting a submodule

Notes

avital Dec 16, 2021

cgarciae Dec 16, 2021 Maintainer Author

avital Jan 13, 2022

cgarciae Jan 13, 2022 Maintainer Author

cgarciae
Oct 15, 2021
Maintainer

Loading a pretrained module inside `init`

Replies: 3 comments 4 replies

avital
Nov 17, 2021

marcvanzee
Nov 17, 2021
Maintainer

`init` and `apply`

cgarciae
Nov 25, 2021
Maintainer Author

cgarciae Dec 16, 2021
Maintainer Author

cgarciae Jan 13, 2022
Maintainer Author