Update ML Decoder #2045

fffffgggg54 · 2023-11-27T20:18:29Z

Update ML Decoder's TransformerDecoderLayerOptimal module to comply with what nn.TransformerDecoder expects. Current changes work with resnet50.

add_ml_decoder_head needs to be updated for other models. In my limited testing, the following case works with RegNet:

elif hasattr(model, 'head'):    # ClassifierHead and ConvNext
    if hasattr(model.head, 'flatten'):  # ConvNext case
        model.head.flatten = nn.Identity()
    model.head.global_pool = nn.Identity()
    del model.head.fc
    num_classes = model.num_classes
    num_features = model.num_features
    model.head.fc = MLDecoder(num_classes=num_classes, initial_num_features=num_features)

HuggingFaceDocBuilderDev · 2023-11-27T20:46:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

fffffgggg54 · 2023-12-18T07:09:06Z

Currently have 3 variants: original with compatibility changes for pt 2.1+, a version identical to the original but with a performance fix that's giving ~35% training speedup, and a WIP reimplementation that updates styling and changes the decoder block implementation. The last one involves architectural changes that removes some of odd components from #1012 (residual on dropouts and queries, mlp residual location, norm locations). I'm in the process of testing these changes.

@mrT23 do you have any comments? @rwightman are you aware of any pytorch transformer decoder/cross attention implementations that follow the style of this library or that I could reference? I inferred the cross attention impl based on the ViT attn impl and the block structure from the original impl, but I would rather this impl follow something standard (less the the self attn in the decoder), than introduce other odd implementation choices.

Model compatibility is iffy, there are a few with odd architectures (combining multiple feature maps, distillation architectures, etc) that would be a pain to special case and probably won't be used and a few other more prominent architectures that don't work because they use nhwc. Overall seems like it would be difficult to maintain along with #2048. @rwightman Do you think a revised classifierhead that supports additional pooling and head formats would be better? There are quite a few structures that can be placed there (pool->ffn, various ml-decoder-like mechanisms). Since many models already have forward split into forward_features and forward_head, I hope that a change like this will help abstract the design of the head from the model and provide a convenient and unified way to modify/swap the head.

rwightman · 2023-12-20T00:28:42Z

@fffffgggg54 curious what your goals are for this impl, what sort of applications, etc.

For cross attn, don't have many examples in this codebase, but prefer to keep the naming of inputs parallel... ie x_q/x_kv, inputs_q/inputs_kv
Making these sorts of pool + head or pool only additions universally applicable to all the timm models is a bit of a pain due to decisions in the past to keep some compatibility with original weights, would have been easier if I'd pulled all head related weights out into their own submodule for ALL models. Still have some ideas to work on here but currently focused on a document ai project.
The most common type of attention pooling / head impl I've been working with in recent months is the attention pooling of the CLIP variety (attention_pool2d.py) and especially SigLIP style with a latent q (attention_pool.py), which isn't too far different from the perceiver resampler. Been fiddling with the latent style with a non singular q seq_len and qk norms for another project...

fffffgggg54 · 2023-12-20T15:10:16Z

Goals are primarily performance, compatibility, and consistent styling with newer timm implementations. Legacy version provides support and improved performance and reimplementation attempts to match other timm models and removes nn.TransformerDecoder and nn.MultiheadAttention. Original impl gives me around 2300 img/s, new impls give me 2850-2900 img/s, gap head gives me around 3000 img/s at 100 groups and 1588 classes.

I work almost exclusively with a dataset (danbooru) that poses a multi-label positive-unlabeled problem. I use MLDecoder for this sometimes and recently noticed the pt 2.1 and extensive model compat issues along the slow groupFC impl, odd dropout, etc, prompting both versions. In addition to PU-specific techniques (often in math-heavy papers that are a bit of a headache to read), some researchers focus on aspects of the model, often what comes after the backbone (GNNs, text towers, MLDecoder, activation functions). The labeling scheme of danbooru is set up such that the labels present in an image can be mapped out hierarchically, similar to a scene graph. This is also done internally via label implications. I'm also working on implementing a from-scratch impl of DependencyViT for this, not going well, hopefully can exploit the tree structure for this.

I have a drop-in replacement for ClassifierHead in a notebook right now that works as long as self.head(x) and self.head.reset style call is used (hence #2050, #2051). My plan for the head submodule would be to gradually change models to follow regular ClassifierHead models and maintain compat with original weights either via aliases or modified weight remap fn. I hope that this change will help standardize head impls for current/new models and help with integrating different head cfgs. I'm down to impl this, just not sure of how you'd like it done.

fffffgggg54 · 2023-12-26T00:24:34Z

Added tests from my own testing script, will fail because there are models that don't work, 95 variants specifically, mostly due to distill/multiple feature maps/wrong input shape. The code to add the head is messy, universal head should fix. MLDecoderHead would basically be how the universal head would have to be configured in order for MLDecoder to work.

fffffgggg54 · 2024-07-02T03:43:36Z

Experimental feature. Want to wait to merge this until a universal head is implemented. This and other things I'm working on are a pain to implement/use/add to timm without a universal head.

allow external class embed (ex text embeddings of class descriptions), head version toggle

fffffgggg54 added 24 commits December 26, 2024 11:06

Update ml_decoder.py

f259cb4

Update ml_decoder.py

ec71960

Update beit.py

1d3aeb2

Update beit.py

6f65415

Update beit.py

1bc81c5

Update edgenext.py

e00d7a0

Update edgenext.py

e7661e9

vectorize GroupFC

43f6a30

Update ml_decoder.py

e6825bb

Update ml_decoder.py

82ebc31

Update ml_decoder.py

1440712

Update ml_decoder.py

2cc0094

Update ml_decoder.py

ec4abba

Update ml_decoder.py

6b2e3f1

Update ml_decoder.py

2befc41

Update ml_decoder.py

6376ad2

Update ml_decoder.py

d3185a9

Update ml_decoder.py

0fbf04f

Update ml_decoder.py

e89b976

Update ml_decoder.py

b2849e9

Update ml_decoder.py

0c39b2b

Update ml_decoder.py

4107c0b

Update ml_decoder.py

0cd2bca

Update ml_decoder.py

162f576

fffffgggg54 added 10 commits December 26, 2024 11:15

Update ml_decoder.py

7b22abe

Update ml_decoder.py

82c1286

Update ml_decoder.py

b28d03d

Update ml_decoder.py

e7975df

Update ml_decoder.py

4c0781b

Update ml_decoder.py

18ef272

tests

4c5f60e

Update ml_decoder.py

f6bc034

allow external class embed (ex text embeddings of class descriptions), head version toggle

head reset will use the same class of head

abbdf05

Update ml_decoder.py

b927237

fffffgggg54 force-pushed the ml-decoder branch from bf08a92 to b927237 Compare December 26, 2024 19:17

fffffgggg54 added 14 commits December 26, 2024 21:39

Feature: allow custom query in forward

2c75ebd

shared fc option for unseen class 0-shot

f4b0c46

act_layer toggle

32f6e51

Allow merge by adding

a41a4d8

num_groups = num_classes QOL

04c1eea

typo

44f93e9

Update ml_decoder.py

5c32658

Update ml_decoder.py

fbeed51

Update ml_decoder.py

5cac39c

Update ml_decoder.py

bf81448

Update ml_decoder.py

566a843

Incorrect indexing in GroupLinear

86c1104

bias incorrect

d6bb62f

contiguous

ca3b3f0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update ML Decoder #2045

Update ML Decoder #2045

fffffgggg54 commented Nov 27, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 27, 2023

fffffgggg54 commented Dec 18, 2023

rwightman commented Dec 20, 2023

fffffgggg54 commented Dec 20, 2023

fffffgggg54 commented Dec 26, 2023 •

edited

Loading

fffffgggg54 commented Jul 2, 2024

Update ML Decoder #2045

Are you sure you want to change the base?

Update ML Decoder #2045

Conversation

fffffgggg54 commented Nov 27, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Nov 27, 2023

fffffgggg54 commented Dec 18, 2023

rwightman commented Dec 20, 2023

fffffgggg54 commented Dec 20, 2023

fffffgggg54 commented Dec 26, 2023 • edited Loading

fffffgggg54 commented Jul 2, 2024

fffffgggg54 commented Nov 27, 2023 •

edited

Loading

fffffgggg54 commented Dec 26, 2023 •

edited

Loading