-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Support FasterViT #1842
Comments
yeah, noticed this one, it is timm oriented but as always, baked in square image size assumptions and put the downsample at the end of the blocks so needs a decent amount of attention to fix and remap :( I really truly don't understand the obsession with putting downsample at the end of vit/hybrid blocks :( Other thing is, I've never found gcvit (same authors) to be particularly easy to train or fine-tune (including reproducing the original results) compared to vit, swin, convnext (which I've successfully managed to reproduce and improve on originals). I wonder how this compares.... given the complexity of the model code, I found the throughput #s surprising as more code usually == more activations and slower speeds. |
Hi, guys, is there any update on this issue? The throughout is really high. |
Hi, I can take this one. I'll begin by moving the downsamples as mentioned here |
”FasterViT: Fast Vision Transformers with Hierarchical Attention“
https://github.com/NVlabs/FasterViT
The code is written based on timm and provides pretrained weights on ImageNet1k. But there are many layers customized in the code which are different from the implementation of timm. So I'm not sure if we need to make significant adjustments to these code.
It looks interesting, but it doesn't seem like the paper has been released.
The text was updated successfully, but these errors were encountered: