Themo 🗿

Themo, named after the beloved Chilean cartoonist Themo Lobos, is a BERT-based CLIP text encoder trained in spanish.

Why Themo?

Multimodal learning has revolutionized many aspects of deep learning, but most of these models are only trained in english, and thus only work in said language.

Our goal here is to take advantage of the knowledge already present in CLIP, and fine tune a language model pre-trained on spanish to learn to translate into CLIP's shared latent space, following Multilingual-CLIP's approach.

Currently, we have only trained a small proof of concept version. We plan to train more versions once we have a robust spanish-only multimodal dataset, and access to more GPU's. 😊

Training 🧪

To train your own version of Themo, simply run:

python -m themo train

Evaluation 📝

Our best results were achieved with the following hyperparameters:

python -m themo train --batch-size 256 --learn-rate 8e-5

Which achieved a final training loss of 0.244 and the following evaluation scores:

	@01	@05	@10
Accuracy	0.366	0.586	0.649
Retrieval	0.481	0.752	0.85

To evaluate your trained model, run (something like):

python -m themo test --version-path logs/.../version_X

For the sake of comparison, here are the baseline results (taken from Multilingual-CLIP):

	@01	@05	@10
Accuracy	0.370	0.594	0.660
Retrieval	0.504	0.795	0.888

These can also be accessed running:

python -m themo test --baseline

Evaulation Data

Some data is kinda tricky to get and/or is super redundant because we could only use the test set.

For simplicity here are some instructions on how to download the data we are using.

MSCOCO / XTD10

The captions come from the official repo of XTD10 and the implementation takes care of the download.

The images come from standard MSCOCO, but not all images are used. To download the filtered version run:

mkdir -p data/mscoco && wget -O- https://users.dcc.uchile.cl/\~gchapero/datasets/coco_xtd10.tar.gz | tar -xz -C data/mscoco

You can use full MSCOCO but it is disk-inefficient.

The data directories should look like this for the images to be located properly:

data
...
├── mscoco
│   ├── train2014
│   │   ...
│   │   ├── COCO_train2014_000000436508.jpg
│   │   ├── COCO_train2014_000000436515.jpg
│   │   ...
│   └── val2014
│       ...
│       ├── COCO_val2014_000000127068.jpg
│       ├── COCO_val2014_000000127074.jpg
│       ...
...

The command here should leave things in this format. Any extra dirs and files are ignored, so you can use full MSCOCO if you want.

ImageNet

Same as with MSCOCO, you can use the full ImageNet in the data dirrectory, but the training images are not needed. The following command only downloads the splits needed for this work:

mkdir -p data/imagenet && wget -O- https://users.dcc.uchile.cl/\~gchapero/datasets/imagenet_object_localization_patched2019_val_test_only.tar.gz | tar -xzC data/imagenet

The data directory should end up looking like this, whether you use full ImageNet or our filtered version:

data/
├── imagenet
│   ├── ILSVRC
│   │   ├── Annotations
│   │   │   └── CLS-LOC
│   │   │       └── val
│   │   │           ├── ILSVRC2012_val_00000001.xml
│   │   │           ├── ILSVRC2012_val_00000002.xml
│   │   │           └── ...
│   │   └── Data
│   │       └── CLS-LOC
│   │           ├── test
│   │           │   ├── ILSVRC2012_test_00000001.JPEG
│   │           │   ├── ILSVRC2012_test_00000002.JPEG
│   │           │   └── ...
│   │           └── val
│   │               ├── ILSVRC2012_val_00000001.JPEG
│   │               ├── ILSVRC2012_val_00000002.JPEG
│   │               └── ...
│   ├── LOC_sample_submission.csv
│   ├── LOC_synset_mapping.txt
│   ├── LOC_train_solution.csv
│   └── LOC_val_solution.csv
└── ...

Any extra dirs or files are ignored, so that you can use the full ImageNet if you have it at hand.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
requirements		requirements
themo		themo
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Themo 🗿

Why Themo?

Training 🧪

Evaluation 📝

Evaulation Data

MSCOCO / XTD10

ImageNet

About

Releases

Packages

Contributors 2

Languages

OpenCENIA/themo

Folders and files

Latest commit

History

Repository files navigation

Themo 🗿

Why Themo?

Training 🧪

Evaluation 📝

Evaulation Data

MSCOCO / XTD10

ImageNet

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages