Skip to content

ComposeTransformers: the end-to-end recommender system for woman clothing via image-text content retrieval.

Notifications You must be signed in to change notification settings

recohut/Clothing_MMRetrieval

 
 

Repository files navigation

Clothing Retrieval

Abstract. In this work, we construct an advanced recommender system for clothing retrieval by image-content queries. Where users give an image of clothing and ask for the modification by text, the system yields the answer by an image according to their request. Employing the Transformers-based image and text feature extractors. Learning the composition features by supervised Deep Metric Learning, and satisfying the rotational symmetry constraint on complex feature space, our ComposeTransformers retrieves 55.42% of relevant images on the total of 2,646 test images on the database when performing 1200 queries and taking top 50 search results.

Keywords: Vision Transformer, BERT, multi-modal search.

A. Paper and Seminar meterial

⭐ For detail of report, watch this article.

⭐ For slide of the seminar, watch here.

B. Technial tool

Annotations for modules of source code:

  • requirements: includes necessary libraries.
  • config: a configuration file used for both training and inference phase.
  • Fashion200k: The folder containing all data and annotations, it is not available at the moment.
  • dataloader: code for dataloader (including image and text pre-processing).
  • img_text_composition_model: containing image-text composition module.
  • logger: logger of the training phase.
  • tester: for testing performance of the retrieval model.
  • trainer: code for training phase with the ability to track loss and evaluation metrics during this progress.
  • triplet_loss: soft triplet loss module.
  • utils: utility functions.
  • ComposeTransformers_Notebook: notebook for training & evaluation and inference.
  • IMAGE_FTRS: including extracted feature and path for all images in the sub-dataset.

C. Results

The pre-trained model

Ask me in issue if you look foward to the pre-trained model

Evaluation results

Precision (%)
P@1 P@10 P@50 P@100
0.25 5.5 31.6 58.0
Recall (%)
R@1 R@10 R@50 R@100
4.9 22.6 55.4 75.7

D. Demo

Output process
demo.mp4
Full demo video

About

ComposeTransformers: the end-to-end recommender system for woman clothing via image-text content retrieval.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.6%
  • Python 0.4%