Convolutional neural network experiment for 1st gen Pokémons detection, made with TensorFlow.
The model is available on this HuggingFace repository. Here is a Colab Notebook to test it.
The training dataset contains 17000 pictures divided in 143 classes of 1st gen Pokémons.
The input has been preprocessed with 200x200 resizing and augmented with random orizontal flip and 20% zoom range.
Every convolutional layer has a LeakyReLU (alpha = 0.15) activation function to prevent vanishing gradients and disappearing relu issues, with padding 'same' and 'he_normal' kernel initialization. In every layers pack there are a Batch Normalization, a Max Pooling layer (2x2) and a 20% Dropout to prevent overfitting.
From the 2nd to the 4th convolutional layers pack, the dilation rate increases from 1x1 to 3x3, to upgrade the area of intervention of every filer and increase the features detection performance, according with the same principle described in this paper. This solution increases the accuracy on validation and test set of 3% (95% accuracy).
After the convolutional layers, the learning and classification is performed by two Dense layers with 512 and 256 weights, with a 40% Dropout.
Here is a plot of the feature maps extracted from every convolutional layers pack:
If you are curious about the visual differences between the feature maps of different kinds of layers, I made a few plots comparing them with the same 6 filters (initializer = GlorotUniform(seed=5)).
This is the list of tested solutions:
- Classic Conv2D, 1 layer, 3x3 kernel
- Separable Depthwise Convolution, 1 layer, 3x3 kernel
- Classic Conv2D, 1 layer, 5x5 kernel
- Dilated Convolution, 1 layer, 3x3 kernel, 2x2 dilation rate
- Dilated Convolution, 1 layer, 3x3 kernel, 3x3 dilation rate
- Dilated Convolution, 3 layers, 3x3 kernel, dilation rates 1x1-2x2-3x3
- Classic Conv2D, 3 layers, 3x3 kernel, 2 layers for MaxPooling 2x2
- Classic Conv2D, 3 layers, 3x3 kernel, 2 layers for AveragePooling 2x2
- Classic Conv2D, 3 layers, 3x3 kernel, 1 layer for MaxPooling 2x2, 1 layer for AveragePooling 2x2
- Dilated Convolution, 3 layers, 3x3 kernel, dilation rates 1x1-2x2-3x3, 2 layers for MaxPooling 2x2
- Dilated Convolution, 3 layers, 3x3 kernel, dilation rates 1x1-2x2-3x3, 2 layers for AveragePooling 2x2
- Dilated Convolution, 3 layers, 3x3 kernel, dilation rates 1x1-2x2-3x3, 1 layer for MaxPooling 2x2, 1 layer for AveragePooling 2x2