We introduce GLAMI-1M: the largest multilingual image-text classification dataset and benchmark. The dataset contains images of fashion products with item descriptions, each in 1 of 13 languages. Categorization into 191 classes has high-quality annotations: all 100k images in the test set and 75% of the 1M training set were human-labeled. The paper presents baselines for image-text classification showing that the dataset presents a challenging fine-grained classification problem: The best scoring EmbraceNet model using both visual and textual features achieves 69.7% accuracy. Experiments with a modified Imagen model show the dataset is also suitable for image generation conditioned on text.
- The paper (with supplementary) was published at BMVC 2022.
- On The Papers with Code, we set up multilingual image-text classification benchmark.
- Colab notebook is provided within Google, and also in this repository.
- A video recording walk through the Colab code is available on YouTube.
If you use or reference the dataset, please use the following BibTex entry to cite the paper:
@inproceedings{Kosar_2022_BMVC,
author = {Vaclav Kosar and Antonín Hoskovec and Milan Šulc and Radek Bartyzal},
title = {GLAMI-1M: A Multilingual Image-Text Fashion Dataset},
booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
publisher = {{BMVA} Press},
year = {2022},
url = {https://bmvc2022.mpi-inf.mpg.de/0607.pdf}
}
Try hands-on exercise with the dataset in this Google Colab notebook.
You can either manually download the dataset zip file yourself or use the repository scripts to download, extract, and load into a dataframe.
To manually download the dataset ZIP file(s):
-
GLAMI-1M Dataset (the default 228x298 version) here (1 zip file of 11GB).
- You can use following commands to download this dataset file manually:
wget -O GLAMI-1M-dataset.zip https://huggingface.co/datasets/glami/glami-1m/resolve/main/GLAMI-1M-dataset.zip
- Calculate md5 hash with:
md5sum GLAMI-1M-dataset.zip
- Then check that it returns:
500348bbf54595db81cba353acd50d78 GLAMI-1M-dataset.zip
-
GLAMI-1M Dataset 800x800 version here (11 zip files of 54GB) (See Zenodo for md5 hashes)
Together with the dataset with provide helper code and experiments described in the paper. The model weights will be uploaded soon.
Below are steps to get minimal installation to be able to download the dataset with Python
conda create -n g1m python=3.9
conda activate g1m
git clone https://github.com/glami/glami-1m.git
cd glami-1m
pip install -r requirements_minimal.txt
The 228x298 dataset version can be downloaded programmatically using repository Python code. The destination directory to download and extract the dataset can be configured with environmental variable EXTRACT_DIR. EXTRACT_DIR is by default configured as a Linux temporary directory, which is removed upon machine restarts. Before downloading the dataset, make sure you have enough space to download and unzip corresponding version of the dataset.
EXTRACT_DIR="/tmp/GLAMI-1M/" python -c 'import load_dataset; load_dataset.download_dataset())'
After the download, we can load the dataset into a dataframe.
EXTRACT_DIR="/tmp/GLAMI-1M/" python -c 'import load_dataset; print(load_dataset.get_dataframe("test").head())'
To fully produce all code in the repository install all requirements via:
pip install requirements.txt
The code is available in folders classification
, image-to-text
, and translation
.
Weights for the baseline models described in the paper are available here.