How to design classes that can load pretrained checkpoints. #2454

patrickvonplaten · 2022-09-12T11:02:44Z

patrickvonplaten
Sep 12, 2022

Hey,

We would like to integrate Flax/JAX into the diffusers repo: huggingface/diffusers#475 .
Similar to Transformers, a very important aspect of diffusers is the ability to load pretrained checkpoints easily into the classes, so we
would like to have the following functionality.

from diffusers import FlaxUNet2DModel

FlaxUNet2DModel.from_pretrained(...)

Because of this we decided on the following design in Transformers (defining it here as design 1.))

class BertModel:    # <- no that this class does **not** depend on `flax.linen.module`
    self.module = BertModule


    def __init__(self, params):
        self.params = params

   def __call__(self):
       return self.module.apply({params: self.params}, ...)

   def from_pretrained(cls, path):
       ...
       return cls(loaded_params)

In the aftermath @patil-suraj and I had the feeling that this might have been the wrong design as it breaks the "state-less" assumption of JAX models. This is the reason we added a second API to transformers only recently: huggingface/transformers#16148

Having thought about this a bit more, we would like to change the design in diffusers from the start and only adopt the "stateless" solution. This would mean we would like to do something (design 2.))


        def __init__(self, num_layers):   # we need our own init here and cannot use Python's `dataclass` style
            self.num_layers = num_layers

        def setup(self):   # init all the submodules

        def from_pretrained(cls, ...):
            return cls(...), params   # <- to keep it stateless

        def apply(...) # use apply as usual


         def call(...) # use call as usual

Note that this would mean that the API for JAX would be something like:

from diffusers import FlaxUNet2DModel

model, params = FlaxUNet2DModel.from_pretrained(...)

output = model.apply({params: params}, tensor)

Does this API make sense for you?
The main questions would be:

1.) Is it ok to define a __init__(...) in a nn.Module. Note that we need to do this so that our register_config function works as expected: https://github.com/huggingface/diffusers/blob/25a51b63ca75e1351069bee87a0fb3df5abb89c3/src/diffusers/models/unet_2d.py#L58
2.) What do you think about the design opinion here in general. Would you also favor 2.) over 1.) to keep "stateless-ness" intact?

cc @marcvanzee @jheek @cgarciae @levskaya @jekbradbury @boris @borisdayma

borisdayma · 2022-09-12T13:57:19Z

borisdayma
Sep 12, 2022

Interesting, I guess the goal is to eliminate the need for the separate configuration class and directly define the model without a separate module.

New API looks fine to me.

I didn't really understand the issue with dataclasses (which avoids repetitive code and errors). Can you just decorate it directly with register_config, or if needed leverage __post_init__?

0 replies

cgarciae · 2022-09-12T14:59:22Z

cgarciae
Sep 12, 2022
Maintainer

Hey @patrickvonplaten, this would be amazing! I've been using the transformers module, having a single abstraction would definitely make things a lot easier.

Regarding custom __init__, I was looking into this recently, it is possible with a couple of caveats (that I know of):

You must support the same constructor signature as the equivalent defined by dataclass.
You must call __post_init__ yourself (if needed).

Point 1 is required because Module.clone creates new instances by passing all dataclass fields to the constructor:

flax/flax/linen/module.py

Lines 972 to 974 in e320e11

    
           attrs = {f.name: getattr(self, f.name) for f in dataclasses.fields(self) if f.init} 
        
           attrs.update(parent=parent, **updates) 
        
           return self.__class__(**attrs)

Example:

import flax.linen as nn

class GoodCustomInit(nn.Module):
    a: int

    def __init__(self, a: int, parent=None, name=None):
        self.a = a
        self.parent = parent
        self.name = name

    def __call__(self, x):
        return x + self.a

module = GoodCustomInit(1, 2)
y = module.apply({}, 3)

Notice you must additionally provide + set the parent and name arguments / fields, these are automatically added to all Module classes. If you don't do this you will get an error in when using method such as apply:

class BadCustomInit(nn.Module):
    a: int

    def __init__(self, a: int):
        self.a = 10
    
    def __call__(self, x):
        return x + self.a

module = BadCustomInit(1)
y = module.apply({}, 3) # TypeError: __init__() got an unexpected keyword argument 'parent'

Additionally, beware of how Mixins interact with Flax (e.g. #1409), and in general take into account how inheritance works in dataclasses (check out some patterns that don't work here).

A strategy that might work:

Go all-in for custom __init__s e.g. all subclasses must implement __init__ if you are doing inheritance.
Add an accompanying field annotation for each constructor argument except parent and name.
Don't set default values on class fields as this could trigger some dataclass inheritance issues, default values go in __init__.

Example:

import flax.linen as nn

class A(nn.Module):
    a: int

    def __init__(self, a: int, parent=None, name=None):
        self.a = a
        self.parent = parent
        self.name = name

    def __call__(self, x):
        return x + self.a


class B(A):
    b: int

    def __init__(self, a: int, b: int, **kwargs):
        self.b = b
        super().__init__(a, **kwargs)
    
    def __call__(self, x):
        return super().__call__(x) + self.b

module = B(1, 2)
y = module.apply({}, 3)

0 replies

borisdayma · 2022-09-12T15:22:50Z

borisdayma Sep 12, 2022

Very interesting, maybe it's safer to just wrap a vanilla flax module as before and pass the config dict.
Could also avoid risks of breaking changes related to flax.

cgarciae · 2022-09-12T16:45:31Z

cgarciae Sep 12, 2022
Maintainer

@borisdayma see edit :)

cgarciae · 2022-09-12T17:04:02Z

cgarciae
Sep 12, 2022
Maintainer

To avoid all the mess I've mentioned about having custom __init__s, how about just creating a new class decorator that would redefine the normal __init__ created by dataclass? E.g.

@new_register_to_config
class BertModel(nn.Module):
  ...

2 replies

patrickvonplaten Sep 12, 2022
Author

That's a very cool idea!

mishig25 Sep 19, 2022

@cgarciae thanks a lot for the suggestion!

As suggested, we have implemented a register_config cls decorator that is being used as here:

@flax_register_to_config
class FlaxUNet2DConditionModel(nn.Module, FlaxModelMixin, ConfigMixin):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to design classes that can load pretrained checkpoints. #2454

{{title}}

Replies: 4 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

This comment has been minimized.

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

How to design classes that can load pretrained checkpoints. #2454

patrickvonplaten Sep 12, 2022

Replies: 4 comments · 4 replies

borisdayma Sep 12, 2022

cgarciae Sep 12, 2022 Maintainer

This comment has been minimized.

borisdayma Sep 12, 2022

cgarciae Sep 12, 2022 Maintainer

cgarciae Sep 12, 2022 Maintainer

patrickvonplaten Sep 12, 2022 Author

mishig25 Sep 19, 2022

patrickvonplaten
Sep 12, 2022

Replies: 4 comments 4 replies

borisdayma
Sep 12, 2022

cgarciae
Sep 12, 2022
Maintainer

cgarciae Sep 12, 2022
Maintainer

cgarciae
Sep 12, 2022
Maintainer

patrickvonplaten Sep 12, 2022
Author