Skip to content
/ crier Public

CRIER: Custom Reverse Image Extractions Ranked

License

Notifications You must be signed in to change notification settings

ashaltu/crier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CRIER: Custom Reverse Image Extractions Ranked

Author

Engineered by Abduselam Shaltu. Fulfillment for the final project of cse-455 taught by Joseph Redmon at the University of Washington.

Published on March 18, 2022

Source code for the backend and for the frontend. Here is a Jupyter Notebook to quickly demo CRIER and also made available in Google Colab.

Abstract

In this project, I develop a reverse image search engine that is easily customizable. I open-source a backend built-in Python that allows indexing of custom image corpora, a separation between corpora, easy search functionality, endpoints with an easy one-command server spin up, token-management so users retrieve images from their own custom corpus, evaluation using mAP@k and mAR@k metrics, comparing against different retrieval implementations(histogram-based image retrieval implementation provided), and Jupyter notebooks to demo and experiment. I also open-source a frontend interface built-in React that easily connects to the aforementioned backend server that allows the following functionality: searching through a provided example image database, indexing a new image corpus, searching through the indexed image corpora, and deleting the indexed image corpora. I compare CRIER with a histogram-based image retrieval model across three different datasets. I also discuss the challenges and takeaways of the project. A video summary of CRIER can be found here.

Problem and Motivation

For the average consumer, searching consists of typing keywords or in some advanced systems, phrases to retrieve relevant documents, images, tables, and other kinds of data. When it comes to searching for images, the average consumer can search with keywords/phrases on their own phones and computers, and search with keywords/phrases and images on large-scale systems like Google Search and Google Images. For this project, I intend to build an easily customizable system to retrieve relevant images from a custom database by searching with an image. For these individuals, systems like Google Images are unhelpful since they reverse image search through a web-indexed image corpus. Additionally, the barrier of entry can be quite costly and tedious for consumers/industry workers trying to test out the usefulness of custom reverse image engines on cloud services like Google Cloud, Microsoft Azure, and Amazon Web Services. This project proposes CRIER, a Custom Reverse Image Extractions Ranked system. By building this system, the entry point for consumers and industry workers across different sectors to create a custom reverse image search engine drastically decreases.

Previous work

For this project, I relied on extracted feature vectors from EfficientNetV2. EfficientNetV2, released by Google, performs expectionally well for tasks that may require a CNN. It outperforms other SOTA models and trains 5-10x faster on image datasets like CIFAR and ImageNet. Below is a plot comparing it's accuracy against other SOTA models on ImageNet for top-1 accuracy.

Comparison showing EfficientNetV2 outperforming other SOTA on the ImageNet dataset

I use ScaNN, also released by Google, as my Approximate Nearest Neighbor search library. ScaNN is a SOTA ANN library through the development of a new technique called "Anisotropic Vector Quantization". Below is a plot demonstrating its high QPS and accuracy in comparison to other popular ANN libraries.

Comparison showing high accuracy and queries per second where the scann library is dominating

Approach

The backend and frontend servers are running on a Microsoft Azure VM, with a size of Standard B2s.

Backend (Python)

The core of my approach is to use EfficientNetV2's feature extractor as an image encoder to create 1280 fixed-size embeddings for each image. Images are resized to 384x384 with Bilinear Interpolation to fit the model. Pixel values are also normalized in the range of [0, 1]. Then, I "index" the images by passing in all the embeddings into a ScaNN search model. When we want to actually reverse image search, we will follow the same steps by encoding the query image and getting a 1280 fixed-size embedding, then searching with the ScaNN index.

There are many variants of EfficientNetV2, I use EfficientNetV2-S due to it having a relatively small number of parameters(~20M) and high accuracy. I set up a ScaNN index with DotProduct as the similarity function.

I created REST APIs to allow easy interactions with CRIER: CreateToken, RemoveImages, AddImages, SearchDatabase, and RetrieveImages. To do this, I used Flask to create and manage my backend server. To allow multiple users to create their own databases, I built a token manager to ensure users retrieve images from their own image database by token authentication. A scheduler is also created to erase any image corpora and ScaNN index models since this is a demo.

Frontend (React)

The frontend is built in React JS. There isn't anything too special about the frontend besides creating an actual interface for users to interact with CRIER. I use react-markdown to render this project info page from a markdown file.

Evaluation

I built a Histogram based image retrieval model to compare against CRIER using OpenCV. A histogram-based embedding is made of two parts: histograms of an image across all RGB channels flattened, and means of RGB values. I do this to increase the number of features for an image embedding for the Histogram image retrieval model.

To calculate mAP@k and mAR@k metric values, I use the recmetrics and ml_metrics modules.

Datasets and Data Augmentation

I didn't do any finetuning or transfer learning with the EfficientNetV2 model since I use the pre-trained feature extractor as an image encoder. In theory, developers wanting to finetune the EfficientNetV2 model on a certain domain of images (like a medical doctor finetuning on chest x-rays to diagnose lung disease) definitely able to. Although no methods are provided to do finetuning for this project.

For evaluation, I use three different datasets:

  • CIFAR-100-128: Regulary CIFAR-100 but resized with the CAI Neural API to 128x128 for increased pixel information.
  • Imagenette sized at 320-320 also for more pixel information.
  • A custom dataset: contains pictures of cats, sunflowers, trees, and houses developed by myself.

Demo

Here is the provided example image corpus that users can search through.

Cat Friends cat House 1 House 2 Sunflower 1 Sunflower 2 Sunflower dog Tree

Briefly what it looks to upload an image database. Notice I am using the example image corpus shown above as my database.

Demo of uploading images

My first query will be with this white cat (notice how it is not an image from my database).

Demo of search results of white

And results after querying for the white cat image.

Demo of search results

Another query this time with a blue house (also not in my database).

Demo of search results

And results from searching for the blue house.

Demo of search results

In both search results, it is clear that the model is performing well in returning relevant search results.

Model Evaluation

I measured mAP@k and mAR@k metrics across my custom dataset, CIFAR-100-128, and Imagenette between the CRIER and the Histogram-based image retrieval models. mAP@k, and mAR@k is the mean Average Precision and mean Average Recall for the top-k retrievals. These metrics are typically used in evaluating the performance of recommendation systems. mAP@k evaluates the relevancy of retrieved items(images in our case) whereas mAR@k evaluates how well the recommender(the CRIER or Histogram model) is able to recall all the items the user has rated positively in the test set.

To evaluate the datasets, I first split each dataset into an index and test portion. The splits are percentage-based with ~95% of the dataset going into the index corpus and the rest going into the test portion. The number of results to be outputted by the ScaNN model for the CIFAR-100-128 and Imagenette datasets are set to 25, whereas the custom dataset is set to 10 since the dataset is so small.

The plots below were created by the recmetrics module. In all three instances, it is clear that CRIER outperforms a Histogram based image retrieval model (higher means better).

Evaluation on CIFAR-100-128

Mean Average Precision at K plot showing CRIER outperforming the Histogram based image retrieval on CIFAR 100 128 Mean Average Recall at K plot showing CRIER outperforming the Histogram based image retrieval on CIFAR 100 128

Evaluation on Imagenette

Mean Average Precision at K plot showing CRIER outperforming the Histogram based image retrieval on Imagenette 320 Mean Average Recall at K plot showing CRIER outperforming the Histogram based image retrieval on Imagenette 320

Evaluation on custom dataset

Mean Average Precision at K plot showing CRIER outperforming the Histogram based image retrieval on the custom dataset Mean Average Recall at K plot showing CRIER outperforming the Histogram based image retrieval on the custom dataset

Discussion

What problems did you encounter?

Honestly, too many that I've lost track.

Are there next steps you would take if you kept working on the project?

Find and fix bugs, keep it open-source, and encourage people to find benefits in custom reverse image search. I would also make it more customizable to provide metadata about the image retrieved so the user can do more. Oh, also make this website way more user-friendly and accessible.

How does your approach differ from others? Was that beneficial?

This is essentially a playground customizable reverse image search engine tool, something that hasn't been done before. CRIER seems like a very beneficial tool in many fields. As a consumer, it would be nice to search through my own phone's images with an image. It would also be nice to do that with an album on my phone. As a student, this is a fun tool to play and a great entry point for future students to study Computer Science as a major, and specialize in Machine Learning and Computer Vision. After discussing with some of my peers studying medicine, they see it as useful for quickly diagnosing images of injured patient body parts to retrieve past diagnoses of other patients to help determine the best diagnosis. There are also many medical-related image datasets that a student studying medicine could drag into the CRIER frontend interface and use a new image to search and find the most similar image to understand how to make a correct diagnosis.

About

CRIER: Custom Reverse Image Extractions Ranked

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages