Comparison of cosine similarity performances between VGG16 and ResNet50
Estimated reading time : ⏱️ 5min
- Learn how to extract extract feature vector
- Compute similarity between images
- Make data augmentation to increase dataset
Programming languages:
- Python (framework TensorFlow)
.
├── README.md
│
├── data
│ ├── flowerpot.jpg
│ ├── vase.jpg
│ └── vase2.jpg.csv
│
├── notebooks
│ └── extract_features.ipynb
│
└── report
├── augmented_img
│ ├── vaseAI0.jpg
│ ├── vaseAI1.jpg
│ └── ..
│
└── cos_sim
├── resnet50
│ ├── vase_flowerpot.jpg
│ ├── vase_vase.jpg
│ └── vase_vase2.jpg
│
└── vgg16
├── vase_flowerpot.jpg
├── vase_vase.jpg
├── vase_vase2.jpg
├── vase_vaseAI0.jpg
├── vase_vaseAI1.jpg
└── ..
This project aims to deepen knowledges in CNNs, especially in features extraction and images similarity computation. I decided to work with 2 pre-trained CNN (on ImageNet): the VGG16 and the ResNet50 and to compare their cosine similarity performances. You can choose to load models:
- to make predictions ( include_top = True
: the model will be composed of all layers: 'feature learning block' + 'classification block')
- to extract features (include_top = False
: the classification block is omitted)
[Figure 1]: Architecture of the VGG16 (left) and ResNet50 (right)
In a first time, I wondered which model could predict an image whith the most accuracy. Here I chose to compare their performances for a vase image: the ResNet50 was the best with 99.89% accuracy against 95.06% for the VGG16. The idea in this part was to manipulate and to understand how prediction works.
[Figure 2]: Comparison of predictions (VGG16/ResNet18)
Then I decided to visualize features maps from main blocks in the VGG16. These feature maps output from each block are collected in a single pass to create an image. There are 5 main blocks in the image (e.g. block1, block2, etc.) that end in a pooling layer for the VGG16. You can choose blocks to visualize by the layers index: idx = [2, 5, 9, 13, 17] # [block1, block2, block3, block4, block5]
. Figure 3 highlights that quality-level features extraction is proportional with the network depth
[Figure 3]: Visualization of the 5 main blocks from the VGG16
Now let's focus on features vector extraction. Removing the last layer of the model enables to extract a feature vector as explained previously. Then, the input images is preprocessed (reshaping, RGB->BGR conversion, zero-centering with dataset). The global process on the Figure 4 depicts how to compute similarity between two images. Images were stored on AWS S3 and I used an notebook instance in AWS SageMaker. A features vector was extracted for each image, then the latter compared with cosine similarity. It computes the cosine of the angle between both features vectors with the compute_similarity_img()
function.
[Figure 4]: Computation similarity process
Here are the obtained results for cosine similarity with the VGG16
[Figure 5]: Cosine similarity using VGG16
I decided to increase the dataset and to compare results with data augmentation as shown in Figure 6. For the data augmentation, I used a ImageDataGenerator
object to set up data augmentation parameters. It generated batches of tensor image data with real-time data augmentation:
gen = ImageDataGenerator(
rotation_range=30, # Int: degree range for random rotations
width_shift_range=0.1, # Float: fraction of total width, if < 1, or pixels if >= 1
height_shift_range=0.1, # Float: fraction of total height, if < 1, or pixels if >= 1
shear_range=0.15, # Float: shear Intensity (shear angle in counter-clockwise direction in degrees)
zoom_range=0.1, # Float: range for random zoom
channel_shift_range=10., # Float: range for random channel shifts
horizontal_flip=True # Boolean: randomly flip inputs horizontally
)
[Figure 6]: Cosine similarity with augmented images using VGG16
Then I compared cosine similarity performances between both models:
[Figure 7]: Comparison of cosine similarity between VGG16 and ResNet50