ResNet-18 is a deep convolutional neural network trained on the CIFAR-10 dataset. The architecture is implemented from the paper Deep Residual Learning for Image Recognition, it's a residual learning network to ease the training of networks that are substantially deeper. I am able to achieve 3% lower error rate with less training time compared to the paper Deep Residual Learning for Image Recognition on the cifar-10 dataset because I modified the architecture a bit, I read this paper The Impact of Filter Size and Number of Filters on Classification Accuracy in CNN which states that using filters which makes the size of image too small reduces accuracy, so in order for better accuracy adequate size input size and kernel size should be used so better accuracy can be achieved with less training time, I removed the maxpooling from the first layer and made stride=1 instead of 2, for downsampling instead of using a kernel of 1 with stride=2 I used maxpooling so only the important features are selected. These are used with Adam optimizer using Cosine Annealing Warm Restarts as the learning rate scheduler instead of SGD with COSINE Annealing as mentioned in the paper. I even used more ReLU activations on the hidden layers than the paper.
Once you have these dependencies installed, you can clone the Custom ResNet-18 repository from GitHub:
https://github.com/Moddy2024/ResNet-18.git
- resNet18.ipynb - This file shows how the dataset has been downloaded, how the data looks like, the transformations, data augmentations, architecture of the ResNet and the training.
- prediction.ipynb - This file loads the trained model file and shows how to do predictions on single images, multiple images contained in a folder and images(multiple or single) that can be uploaded to google colab temporarily to perform the prediction.
- trained-models - This directory contains the best trained model and the trained model saved after the last epoch.
- test-data - This directory contains test images collected randomly from the internet of different categories, sizes and shape for performing the predictions and seeing the results.
This custom ResNet-18 includes the following features:
- Support for multiple image sizes and aspect ratios
- Option to fine-tune the model on a specific dataset
- Ability to save and load trained models
The dataset used to train the model is CIFAR-10. The CIFAR-10 dataset consists of 60,000 32x32 color training images and 10,000 test images. Each image is labeled with one of 10 classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. There are 6,000 images of each class in the training set, and 1,000 images of each class in the test set. CIFAR-10 is a popular choice for benchmarking because it is a well-defined and widely-used dataset, and the images are small enough that it is possible to train relatively large models on a single machine.
The CIFAR-10 dataset (Canadian Institute For Advanced Research) can be downloaded from here. It can also be downloaded from PyTorch Datasets.
# Load the CIFAR10 dataset
train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transforms.ToTensor())
test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transforms.ToTensor())
In this repository the dataset has been downloaded using fast.ai as seen below.
from fastai.data.external import untar_data, URLs
data_dir = untar_data(URLs.CIFAR)
data_dir = str(data_dir)