Skip to content

Firefly55lm/cnn_for_pokemon_detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CONVOLUTIONAL NEURAL NETWORK FOR POKÉMON DETECTION

Convolutional neural network experiment for 1st gen Pokémons detection, made with TensorFlow.

TRY IT OUT!

The model is available on this HuggingFace repository. Here is a Colab Notebook to test it.

DATASET

The training dataset contains 17000 pictures divided in 143 classes of 1st gen Pokémons.

ARCHITECTURE

architecture

The input has been preprocessed with 200x200 resizing and augmented with random orizontal flip and 20% zoom range.

Every convolutional layer has a LeakyReLU (alpha = 0.15) activation function to prevent vanishing gradients and disappearing relu issues, with padding 'same' and 'he_normal' kernel initialization. In every layers pack there are a Batch Normalization, a Max Pooling layer (2x2) and a 20% Dropout to prevent overfitting.

From the 2nd to the 4th convolutional layers pack, the dilation rate increases from 1x1 to 3x3, to upgrade the area of intervention of every filer and increase the features detection performance, according with the same principle described in this paper. This solution increases the accuracy on validation and test set of 3% (95% accuracy).

After the convolutional layers, the learning and classification is performed by two Dense layers with 512 and 256 weights, with a 40% Dropout.

Here is a plot of the feature maps extracted from every convolutional layers pack: feature_maps_plot_1 feature_maps_plot_2

TESTED SOLUTIONS

If you are curious about the visual differences between the feature maps of different kinds of layers, I made a few plots comparing them with the same 6 filters (initializer = GlorotUniform(seed=5)).

This is the list of tested solutions:

  • Classic Conv2D, 1 layer, 3x3 kernel
  • Separable Depthwise Convolution, 1 layer, 3x3 kernel
  • Classic Conv2D, 1 layer, 5x5 kernel
  • Dilated Convolution, 1 layer, 3x3 kernel, 2x2 dilation rate
  • Dilated Convolution, 1 layer, 3x3 kernel, 3x3 dilation rate
  • Dilated Convolution, 3 layers, 3x3 kernel, dilation rates 1x1-2x2-3x3
  • Classic Conv2D, 3 layers, 3x3 kernel, 2 layers for MaxPooling 2x2
  • Classic Conv2D, 3 layers, 3x3 kernel, 2 layers for AveragePooling 2x2
  • Classic Conv2D, 3 layers, 3x3 kernel, 1 layer for MaxPooling 2x2, 1 layer for AveragePooling 2x2
  • Dilated Convolution, 3 layers, 3x3 kernel, dilation rates 1x1-2x2-3x3, 2 layers for MaxPooling 2x2
  • Dilated Convolution, 3 layers, 3x3 kernel, dilation rates 1x1-2x2-3x3, 2 layers for AveragePooling 2x2
  • Dilated Convolution, 3 layers, 3x3 kernel, dilation rates 1x1-2x2-3x3, 1 layer for MaxPooling 2x2, 1 layer for AveragePooling 2x2

test1 test2 test3 test4 test5 test6 test7 test8 test9 test10 test11 test12