🥶VILIO🥶

State-of-the-art Visio-Linguistic Models 🥶

Updates

06/2021 - Hateful Memes CSV Files

The CSV files that were used for the scores in the vilio paper are now available here

06/2021 - Inference on any meme

Thanks to the initiative by katrinc, here are two notebooks for using Vilio to perform pure inference on any meme you want :)
Just adapt the example input dataset / input model to use a different meme / pretrained model🥶
GPU: https://www.kaggle.com/muennighoff/vilioexample-nb
CPU: https://www.kaggle.com/muennighoff/vilioexample-nb-cpu

Ordering

Vilio aims to replicate the organization of huggingface's transformer repo at: https://github.com/huggingface/transformers

/bash Shell files to reproduce hateful memes results
/data By default, directory for loading in data & saving checkpoints
/ernie-vil Ernie-vil sub-repository written in PaddlePaddle
/fts_lmdb Scripts for handling .lmdb extracted features
/fts_tsv Scripts for handling .tsv extracted features
/notebooks Jupyter Notebooks for demonstration & reproducibility
/py-bottm-up-attention Sub-repository for tsv feature extraction forked & adapted from here
src/vilio All implemented models (also see below for a quick overview of models)
/utils Pandas & ensembling scripts for data handling
entry.py files Scripts used to access the models and apply model-specific data preparation
pretrain.py files Same purpose as entry files, but for pre-training; Point of entry for pre-training
hm.py Training code for the hateful memes challenge; Main point of entry
param.py Args for running hm.py

Usage

Follow SCORE_REPRO.md for reproducing performance on the Hateful Memes Task.
Follow GETTING_STARTED.md for using the framework for your own task.
See the paper at: https://arxiv.org/abs/2012.07788

Architectures

🥶 Vilio currently provides the following architectures with the outlined language transformers:

E - ERNIE-VIL ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
- ERNIE: Enhanced Language Representation with Informative Entities
D - DeVLBERT DeVLBert: Learning Deconfounded Visio-Linguistic Representations
- BERT: Bidirectional Transformers
O - OSCAR Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
- BERT: Bidirectional Transformers
U - UNITER UNITER: UNiversal Image-TExt Representation Learning
- BERT: Bidirectional Transformers
- RoBERTa: Robustly Optimized BERT Pretraining Approach
V - VisualBERT VisualBERT: A Simple and Performant Baseline for Vision and Language
X - LXMERT LXMERT: Learning Cross-Modality Encoder Representations from Transformers

To-do's

Clean-up import statements, python paths & find a better way to integrate transformers (Right now all import statements only work if in main folder)
Enable loading and running models just via import statements (and not having to clone the repo)
Find a way to better include ERNIE-VIL in this repo (PaddlePaddle to Torch?)
Move tokenization in entry files to model-specific tokenization similar to transformers

Attributions

The code heavily borrows from the following repositories, thanks for their great work:

https://github.com/huggingface/transformers
https://github.com/facebookresearch/mmf
https://github.com/airsplay/lxmert

Citation

@article{muennighoff2020vilio,
  title={Vilio: State-of-the-art visio-linguistic models applied to hateful memes},
  author={Muennighoff, Niklas},
  journal={arXiv preprint arXiv:2012.07788},
  year={2020}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

🥶VILIO🥶

State-of-the-art Visio-Linguistic Models 🥶

Updates

06/2021 - Hateful Memes CSV Files

06/2021 - Inference on any meme

Ordering

Usage

Architectures

To-do's

Attributions

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

🥶VILIO🥶

State-of-the-art Visio-Linguistic Models 🥶

Updates

06/2021 - Hateful Memes CSV Files

06/2021 - Inference on any meme

Ordering

Usage

Architectures

To-do's

Attributions

Citation