Vision | |||
2014 | VAE | Kingma and Welling | [✓] Training on MNIST [✓] Encoder output visualization [✓] Decoder output visualization |
2015 | CAM | Zhou et al. | [✓] Application to GoogleNet [✓] bounding box generation from CAM |
2016 | Gatys et al., 2016 (image style transfer) | Gatys et al. | [✓] Application to VGGNet-19 |
YOLO | Redmon et al. | [✗] Training on VOC 2012 [✗] Class probability map [✗] Ground truth vlisualization on grid |
|
DCGAN | Radford et al. | [✓] Training on CelebA at 64 × 64 [✓] Sampling [✓] Latent space interpolation |
|
Noroozi et al., 2016 | Noroozi et al. | [✓] Architecture [✓] Chromatic aberration [✓] Permutation set |
|
Zhang et al., 2016 (image colorization) | Zhang et al. | [✓] Empirical probability distribution [✗] Color space |
|
2014 2017 |
Conditional GAN WGAN-GP |
Mirza et al. Gulrajani et al. |
[✓] Training on MNIST |
2016 2017 |
VQ-VAE & PixelCNN | Oord et al. Oord et al. |
[✓] Training on Fashion MNIST [✓] Training on CIFAR-10 |
2017 | Pix2Pix | Isola et al. | [✓] Training on Google Maps [✓] Training on Facades [✗] Inference on larger resolution |
CycleGAN | Zhu et al. | [✓] Training on 'monet2photo' [✓] Training on 'vangogh2photo' [✓] Training on 'cezanne2photo' [✓] Training on 'ukiyoe2photo' [✓] Training on 'horse2zebra' [✓] Training on 'summer2winter_yosemite' |
|
Noroozi et al., 2017 | Noroozi et al. | [✓] Constrastive loss | |
2018 | PGGAN | Karras et al. | [✓] Training on CelebA-HQ at 512 × 512 |
DeepLab v3 | Chen et al. | [✓] Training on VOC 2012 [✓] Prediction on VOC 2012 validation set [✓] Average mIoU |
|
PixelLink | Deng et al. | [✓] Architecture [✓] Instance-balanced cross entropy loss [✓] Post-processing |
|
RotNet | Gidaris et al | [✓] Attention map visualization | |
StarGAN | Yunjey Choi et al. | [✓] Architecture | |
2020 | STEFANN | Roy et al. | [✓] FANnet architecture [✓] Training FANnet on Google Fonts [✓] Custom Google Fonts dataset [✓] Average SSIM |
DDPM | Ho et al. | [✓] Training on CelebA at 32 × 32 [✓] Training on CelebA at 64 × 64 [✓] Denoising process visualization [✓] Sampling using Linear interpolation [✓] Sampling using Coarse-to-fine interpolation |
|
DDIM | Song et al. | [✓] Sampling [✓] Sampling using Spherical linear interpolation [✓] Sampling using Grid Interpolation [✓] Truncated normal |
|
ViT | Dosovitskiy et al. | [✓] Training on CIFAR-10 [✓] Training on CIFAR-100 [✓] Attention Roll-out [✓] Position embedding similarity [✓] Position embedding interpolation [✓] CutOut [✓] Hide-and-Seek [✓] CutMix |
|
SimCLR | Chen et al. | [✓] Normalized temperature-scaled cross entropy loss [✓] Data augmentation [✓] Pixel intensity histogram |
|
DETR | Carion et al. | [✓] Architecture [✗] Bipartite matching & loss [✗] Batch normalization freezing [✗] Data preparation [✗] Training on COCO 2017 |
|
2021 | Improved DDPM | Nichol and Dhariwal | [✓] Cosine diffusion schedule |
Classifier-Guidance | Dhariwal and Nichol | [✗] AdaGN [✗] BiGGAN Upsample/Downsample [✗] Improved DDPM sampling [✗] Conditional/Unconditional models [✗] Super-resolution model [✗] Interpolation |
|
ILVR | Choi et al. | [✓] Sampling from single reference [✓] Sampling from various scale factors [✓] Sampling from various conditioning range |
|
SDEdit | Meng et al. | [✓] User input stroke simulation | |
MAE | He et al. | [✓] Architecture for pre-training [✗] Architecture for self-supervised learning [✗] Training on ImageNet-1K [✗] Fine-tuning [✗] Linear probing |
|
Copy-Paste | Ghiasi et al. | [✓] COCO dataset processing [✓] Large scale jittering [✓] Copy-Paste (within mini-batch) [✗] Gaussian filter |
|
ViViT | Arnab et al. | [✓] Model 1: 'Spatio-temporal attention' architecture [✓] Model 2: 'Factorised encoder' architecture [✓] Model 3: 'Factorised self-attention' architecture |
|
2022 | CFG | Ho et al. | |
Language | |||
2017 | Transformer | Vaswani et al. | [✓] Architecture [✓] Position encoding visualization |
2019 | BERT | Devlin et al. | [✓] BookCorpus data pre-processing [✓] Architecture [✓] Masked language modeling [✓] SQuAD data pre-processing [✓] SWAG data pre-processing |
Sentence-BERT | Reimers et al. | [✓] Classification loss [✓] Regression loss [✓] Constrastive loss [✓] STSb data pre-processing [✓] WikiSection data pre-processing [✗] NLI data pre-processing |
|
RoBERTa | Liu et al. | [✓] BookCorpus data pre-processing [✓] Masked language modeling [✗] BookCorpus data pre-processing (SEGMENT-PAIR + NSP) [✗] BookCorpus data pre-processing (SENTENCE-PAIR + NSP) [✓] BookCorpus data pre-processing (FULL-SENTENCES) [✗] BookCorpus data pre-processing (DOC-SENTENCES) |
|
2021 | Swin Transformer | Liu et al. | [✓] Patch partition [✓] Patch merging [✓] Relative position bias [✓] Feature map padding [✓] Self-attention in non-overlapped windows [✗] Shifted Window based Self-Attention |
2024 | RoPE | Su et al. | [✓] Rotary Positional Embedding |
Vision-Language | |||
2021 | CLIP | Radford et al. | [✓] Training on Flickr8k + Flickr30k [✓] Zero-shot classification on ImageNet1k (mini) [✓] Linear classification on ImageNet1k (mini) |
Pinned Loading
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.