This repository contains Python code to retrieve Steam games with similar store banners, using Microsoft's BEiT.
Image similarity is assessed by the cosine similarity between image features encoded by BEiT.
BEiT is a Vision Transformer (ViT):
- pre-trained with self-supervision (using patches, and "visual tokens" from OpenAI's DALL-E) on ImageNet-21k,
- then fine-tuned for classification on ImageNet-21k (14M images with ~21k classes),
- finally fine-tuned for classification on ImageNet-1k (1.28M images with 1000 classes).
Pre-trained models are available at HuggingFace, respectively as:
microsoft/beit-base-patch16-224-pt22k
microsoft/beit-base-patch16-224-pt22k-ft22k
microsoft/beit-base-patch16-224
Larger models are available by changing a keyword in their name: large
(1.2 GB) instead of base
(400 MB).
NB: Table 9 shows that BEiT performs worse than DINO in terms of linear probing on ImageNet-1k. However, keep in mind that DINO concatenates features of intermediate layers!
Data consists of vertical Steam banners (300x450 resolution), resized to 256x384 resolution.
This is performed with rom1504/img2dataset
.
- To download image data, run
download_steam_webdataset.ipynb
.
Alternatively, you can find the data as v0.1
in the "Releases" section of this repository.
- To match images, run
match_steam_banners_with_BEiT.ipynb
.
- Microsoft's Bidirectional Encoder representation from Image Transformers (BEiT):
- A generic repository to match images:
match-steam-banners
: retrieve games with similar banners,
- My usage of Google's Big Transfer (BiT):
steam-BiT
: retrieve games with similar banners, using Google's BiT,
- My usage of Facebook's DINO:
steam-DINO
: retrieve games with similar banners, using Facebook's DINO (resolution 224),
- My usage of OpenAI's CLIP:
steam-CLIP
: retrieve games with similar banners, using OpenAI's CLIP (resolution 224),steam-image-search
: retrieve games using natural language queries,heroku-flask-api
: serve the matching results through an API built with Flask on Heroku.