I tried both ResNet as well as a vanilla model and expectedly ResNet gave a better result. The ResNet was framed and trained from scratch without any pretrained weights. Since it used residual layers, it helped reducing overfitting.
Validation Loss | Validation Accuracy | |
---|---|---|
Vanilla Model | 0.9479 | 66.49% |
ResNet Model | 0.8840 | 70.57% |
Augmenting the data by adding transformations helped reduce overfitting. The transformations
were :
• Random Horizontal Flip
• Random Rotation with a range of 10°
• Colour Jitter – Brightness and Contrast with a range of 0.2, Hue and Saturation with a range of 0.1
Dropout layers also helped boost val_accuracy.
I trained the model over a range of epochs (from 60 to 150). And for the chosen learning rate the model gave the best performance at around 90 epochs i.e. high accuracy without overfitting
Though ResNet gave a higher test accuracy on while running supervised learning, CNN gave a better final test accuracy when we carried out semisupervised using pseudo labelling. The ResNet model overfit on the dataset extremely fast.
I tried multiple threshold’s ranging from 0.7 all the way to 0.96, the best result on the test accuracy was given by 0.93. This can be justified by the fact that a threshold of 0.93 would produce pseudo labels on a fairly large number of images from the unlabelled data set while maintaining a high level of accuracy. (13,925 to be exact)
Rather than choosing the number of epochs manually I decided to utilise the earlyCallback method in .fit() function of Keras. I set the monitor to be val_loss and ended the training if the val_loss stopped decreasing in value.
• Training the initial supervised classifier -
• Training the model further on the new dataset -
One drawback of the semi supervised model is that it’s overfitting. If we put the test data as the validation set for the fitting process it is even more clear. When we train the model on the train dataset, we get a val_accuracy of approximately 0.66 and a val_loss of about 0.95. However, when we add pseudo labels to the untrained dataset and train the model on that, accuracy falls to 0.65 and val_loss increases to 1.02. Though this shift is minor, train_loss continues to decrease along with an increase in train_accuracy thus showcasing overfitting
After running a semi-supervised model on both CNN and ResNet it is evident that the model is clearly overfitting and thus a supervised learning based on ResNet is the optimum solution for the given dataset getting us an val_accuracy of over 0.7 and a val_loss of 0.87
Since training contrastive loss requires heavy computing, (two image augmentations have to be processed simultaneously and two backpropagations must be carried out after that) and cannot be done on a single GPU stream, I imported SimCLR embeddings. As classifier I framed a 3 layered fully connected neural network and trained it by inputting the embeddings obtained after passing the images through the ResNet50 model.
Since we only need to train a simple classifier, overfitting isn’t much of an issue and can be easily avoided using image augmentations and dropout layers. Putting a dropout layer before the first Dense layer helps prevent overdependence on any of the 2048 features in the output of the ResNet50 model. Furthermore only 8 epochs are enough to obtain an optimum accuracy and minimum loss without overfitting on the train set.
Previous models on all datasets have almost exclusively used manual image augmentation which takes experience on data manipulation.
In SimCLR we have to apply two data augmentations on each image and calculate the loss based on the embeddings. Thus, choosing the right augmentation is going to be very important, choosing two augmentations that are very similar will lead to no learning as the embeddings generated will be inherently similar. At the same time, we do not want augmentations which will lead to starkly different embeddings as it will lead to a very vague and generalised final result. Therefore, in SimCLR if implemented properly auto augmentation can help increase the accuracy by a significant margin.
For datasets like MNIST or OCR detection purposes, image augmentations like image ratio manipulation, stretching and compressing will be very useful while colour shifts wont affect the efficiency much
On the other hand, image datasets which are used to classify objects and animals like ImageNet, stretching and compression should be restricted to a very small extent while augmentation should focus more on colour shifts and random cropping. Colour shift will prevent the algorithm from focussing on a given colour while cropping will prevent the algorithm from focussing on a specific feature of a given object/animal.
By training an automated image augmentation, we can make the algorithm itself determine which augmentation will help create the greatest increasing in the accuracy. Augmentation will depend not only on the image subject but also image composition. Darkening an already dark image will lead to featureless images which will lead to no learning. This is an avenue where automated generator will give an advantage over manual augmentation.