Skip to content

Deep convolutional neural network implementation from scratch using Python and NumPy to classify the MNIST dataset

Notifications You must be signed in to change notification settings

RaoulLuque/ImageRecognitionFromScratch

Repository files navigation

Image recognition from scratch

This repository documents the progress for a developing a convolutional neural network from scratch. The goal was to classify the MNIST dataset and the best model achieves an error rate of 0.40%. This work is also accompanied by a written work, see ImageRecognitionFromScratch.

Models

The following is a brief summary of different models that represent different checkpoints in development process.

To run the code or a specific model, please refer to the running a model section.

The logs of the respective models can be found by clicking the links below the respective model to browse the repositories at the respective state and opening the best_result.log or best_result.txt file (depending on how old the model is).

The best model using this library is the twelfth with an error rate of 0.40% on the MNIST dataset test images. To load this model just download the repository at this state and refer to the running a model section. Note that a pretrained model is stored in the models directory. Since the goal of this assignment was to achieve an error rate of 0.30% and this repositories implementation does not have GPU support, a Jupyter notebook is provided with which a TensorFlow model can be setup that achieves a sub 0.30% error rate. Said model could be setup with this library as well, would however take a very long time to train.

First model (09-10% error rate)

3f5521c

  • Stochastic gradient descent (batch size of 1)
  • Mean square error function
  • Model layout:
    model.add_layer(FCLayer(28 * 28, 100))  # input_shape=(1, 28*28)   ;   output_shape=(1, 100)
    model.add_layer(ActivationLayer(ActivationFunction.tanh))
    model.add_layer(FCLayer(100, 50))  # input_shape=(1, 100)          ;   output_shape=(1, 50)
    model.add_layer(ActivationLayer(ActivationFunction.tanh))
    model.add_layer(FCLayer(50, 10))  # input_shape=(1, 50)            ;   output_shape=(1, 10)
    model.add_layer(ActivationLayer(ActivationFunction.tanh))
    
  • 9-10% error rate
  • 100 epochs
  • Fixed learning rate of 0.1

Second model (06.75% error rate)

9eac97e

  • Mini batch gradient descent (batch size of 32)
  • Mean square error function
  • Model layout:
    model.add_layer(FCLayer(28 * 28, 128))  # input_shape=(1, 28*28)   ;   output_shape=(1, 128)
    model.add_layer(ActivationLayer(ActivationFunction.tanh, 128))
    model.add_layer(FCLayer(128, 10))       # input_shape=(1, 128)     ;   output_shape=(1, 10)
    model.add_layer(ActivationLayer(ActivationFunction.tanh, 10))
    
  • 6.75% error rate
  • 100 epochs
  • Fixed learning rate of 0.1

Third model (03.10% error rate)

73111ee

  • Mini batch gradient descent (batch size of 32)
  • Cross entropy loss function
  • Softmax activation function on last layer
  • Model layout:
    model.add_layer(FCLayer(28 * 28, 128))  # input_shape=(1, 28*28)   ;   output_shape=(1, 128)
    model.add_layer(ActivationLayer(ActivationFunction.tanh, 128))
    model.add_layer(FCLayer(128, 10))       # input_shape=(1, 128)     ;   output_shape=(1, 10)
    model.add_layer(ActivationLayer(ActivationFunction.softmax, 10))
    
  • 3.1% error rate
  • 100 epochs
  • Fixed learning rate of 0.1

Fourth model (02.64% error rate)

d578b4b

  • Mini batch gradient descent (batch size of 32)
  • Cross entropy loss function
  • Softmax activation function on last layer
  • Adam optimizer
  • Model layout:
    model.add_layer(FCLayer(28 * 28, 128, optimizer=Optimizer.Adam))  # input_shape=(1, 28*28)   ;   output_shape=(1, 128)
    model.add_layer(ActivationLayer(ActivationFunction.tanh, 128))
    model.add_layer(FCLayer(128, 10, optimizer=Optimizer.Adam))       # input_shape=(1, 128)     ;   output_shape=(1, 10)
    model.add_layer(ActivationLayer(ActivationFunction.softmax, 10))
    
  • 2.64% error rate
  • 30 epochs
  • Fixed learning rate of 0.01

Fifth model (02.19% error rate)

1a608e1

  • Mini batch gradient descent (batch size of 32)
  • Cross entropy loss function
  • Softmax activation function on last layer
  • Adam optimizer
  • Dropout layers
  • Model layout:
    model.add_layer(FCLayer(28 * 28, 128, optimizer=Optimizer.Adam))  # input_shape=(1, 28*28)   ;   output_shape=(1, 128)
    model.add_layer(ActivationLayer(ActivationFunction.tanh, 128))
    model.add_layer(DropoutLayer(0.2, 128))
    model.add_layer(FCLayer(128, 10, optimizer=Optimizer.Adam))       # input_shape=(1, 128)     ;   output_shape=(1, 10)
    model.add_layer(ActivationLayer(ActivationFunction.softmax, 10))
    
  • 2.19% error rate
  • 50 epochs
  • Fixed learning rate of 0.001

Sixth model (02.02% error rate)

251738c

  • Mini batch gradient descent (batch size of 16)
  • Cross entropy loss function
  • Softmax activation function on last layer
  • Adam optimizer
  • Dropout layers
  • Model layout:
    model.add_layer(
      FCLayer(28 * 28, 128, optimizer=Optimizer.Adam))             # input_shape=(1, 28*28)    ;   output_shape=(1, 128)
      model.add_layer(ActivationLayer(ActivationFunction.ReLu, 128))
      model.add_layer(DropoutLayer(0.2, 128))
      model.add_layer(FCLayer(128, 50, optimizer=Optimizer.Adam))  # input_shape=(1, 128)      ;   output_shape=(1, 50)
      model.add_layer(ActivationLayer(ActivationFunction.ReLu, 50))
      model.add_layer(DropoutLayer(0.2, 50))
      model.add_layer(FCLayer(50, 10, optimizer=Optimizer.Adam))   # input_shape=(1, 50)       ;   output_shape=(1, 10)
      model.add_layer(ActivationLayer(ActivationFunction.softmax, 10))
    
  • 2.02% error rate
  • 200 epochs
  • Fixed learning rate of 0.0005

Seventh model (01.93% error rate)

0ae6882

  • Mini batch gradient descent (batch size of 16)
  • Cross entropy loss function
  • Softmax activation function on last layer
  • Adam optimizer
  • Dropout layers
  • (Default) Data augmentation (0.25 Chance to do so)
  • Model layout:
    model.add_layer(
      model.add_layer(FCLayer(28 * 28, 128, optimizer=Optimizer.Adam))  # input_shape=(1, 28*28)    ;   output_shape=(1, 128)
    model.add_layer(ActivationLayer(ActivationFunction.ReLu, 128))
    model.add_layer(DropoutLayer(0.2, 128))
    
    model.add_layer(FCLayer(128, 50, optimizer=Optimizer.Adam))         # input_shape=(1, 128)      ;   output_shape=(1, 50)
    model.add_layer(ActivationLayer(ActivationFunction.ReLu, 50))
    model.add_layer(DropoutLayer(0.2, 50))
    
    model.add_layer(FCLayer(50, 10, optimizer=Optimizer.Adam))          # input_shape=(1, 50)       ;   output_shape=(1, 10)
    model.add_layer(ActivationLayer(ActivationFunction.softmax, 10))
    
  • 1.93% error rate
  • 100 epochs
  • Fixed learning rate of 0.0005

Eight model (01.58% error rate)

c07da15

  • Mini batch gradient descent (batch size of 16)
  • Cross entropy loss function
  • Softmax activation function on last layer
  • Adam optimizer
  • Dropout layers
  • (Default) Data augmentation (0.25 Chance to do so)
  • Early stopping (min relative delta 0.005 and patience of 15)
  • He weight initialization
  • Model layout:
    model.add_layer(FCLayer(28 * 28, 128, optimizer=Optimizer.Adam, weight_initialization=WeightInitialization.he_bias_zero))  # input_shape=(1, 28*28)    ;   output_shape=(1, 128)
    model.add_layer(ActivationLayer(ActivationFunction.ReLu, 128))
    model.add_layer(DropoutLayer(0.2, 128))
    
    model.add_layer(FCLayer(128, 50, optimizer=Optimizer.Adam, weight_initialization=WeightInitialization.he_bias_zero))       # input_shape=(1, 128)      ;   output_shape=(1, 50)
    model.add_layer(ActivationLayer(ActivationFunction.ReLu, 50))
    model.add_layer(DropoutLayer(0.2, 50))
    
    model.add_layer(FCLayer(50, 10, optimizer=Optimizer.Adam, weight_initialization=WeightInitialization.he_bias_zero))        # input_shape=(1, 50)       ;   output_shape=(1, 10)
    model.add_layer(ActivationLayer(ActivationFunction.softmax, 10))
    
  • 1.58% error rate
  • 175 epochs (early stopping after 91)
  • Fixed learning rate of 0.0005

Ninth model (00.80% error rate)

712a13e

  • Mini batch gradient descent (batch size of 16)
  • Cross entropy loss function
  • Softmax activation function on last layer
  • Adam optimizer
  • Dropout layers
  • (Default) Data augmentation (0.25 Chance to do so)
  • Early stopping (min relative delta 0.005 and patience of 20)
  • He weight initialization
  • 2 2D convolutional layers
  • Model layout:
    # Block 1: input_shape=(BATCH_SIZE, 1, 28, 28) output_shape=(BATCH_SIZE, 8, 28, 28)
    model.add_layer( Convolution2D(D_batch_size=BATCH_SIZE, C_number_channels=1, NF_number_of_filters=8, H_height_input=28, W_width_input=28, optimizer=Optimizer.Adam))
    model.add_layer(ActivationLayer(ActivationFunction.ReLu, 0, convolutional_network=True))
    model.add_layer(MaxPoolingLayer2D(D_batch_size=BATCH_SIZE, PS_pool_size=2, S_stride=2, C_number_channels=8, H_height_input=28, W_width_input=28))
    model.add_layer(DropoutLayer(0.2, 0, convolutional_network=True))
    
    # Block 2: input_shape=(BATCH_SIZE, 8, 28, 28) output_shape=(BATCH_SIZE, 16, 14, 14)
    model.add_layer( Convolution2D(D_batch_size=BATCH_SIZE, C_number_channels=8, NF_number_of_filters=16, H_height_input=14, W_width_input=14, optimizer=Optimizer.Adam))
    model.add_layer(ActivationLayer(ActivationFunction.ReLu, 0, convolutional_network=True))
    model.add_layer(MaxPoolingLayer2D(D_batch_size=BATCH_SIZE, PS_pool_size=2, S_stride=2, C_number_channels=16, H_height_input=14, W_width_input=14))
    model.add_layer(DropoutLayer(0.2, 0, convolutional_network=True))
    
    # Block 3: input_shape=(BATCH_SIZE, 16, 7, 7) output_shape=(BATCH_SIZE, 16 * 7 * 7)
    model.add_layer(FlattenLayer(D_batch_size=BATCH_SIZE, C_number_channels=16, H_height_input=7, W_width_input=7))
    
    # Block 4: input_shape=(BATCH_SIZE, 128 * 7 * 7) output_shape=(BATCH_SIZE, 10)
    model.add_layer(FCLayer(16 * 7 * 7, 10, optimizer=Optimizer.Adam, convolutional_network=True))
    model.add_layer(ActivationLayer(ActivationFunction.softmax, 10, convolutional_network=True))
    
  • 0.80% error rate
  • 150 epochs (early stopping after 29)
  • Fixed learning rate of 0.001

Tenth model (00.44% error rate)

b883661

  • Mini batch gradient descent (batch size of 16)
  • Cross entropy loss function
  • Softmax activation function on last layer
  • Adam optimizer
  • Dropout layers
  • (Default) Data augmentation (0.5 Chance to do so)
  • Early stopping (min relative delta 0.005 and patience of 25)
  • He weight initialization
  • 3 2D convolutional layers
  • Model layout:
    # Block 1: input_shape=(BATCH_SIZE, 1, 28, 28) output_shape=(BATCH_SIZE, 16, 14, 14)
    model.add_layer(Convolution2D(D_batch_size=BATCH_SIZE, C_number_channels=1, NF_number_of_filters=16, H_height_input=28, W_width_input=28, optimizer=Optimizer.Adam))
    model.add_layer(ActivationLayer(ActivationFunction.ReLu, 0, convolutional_network=True))
    model.add_layer(MaxPoolingLayer2D(D_batch_size=BATCH_SIZE, PS_pool_size=2, S_stride=2, C_number_channels=16, H_height_input=28, W_width_input=28))
    model.add_layer(DropoutLayer(0.2, 0, convolutional_network=True))
    
    # Block 2: input_shape=(BATCH_SIZE, 16, 14, 14) output_shape=(BATCH_SIZE, 32, 14, 14)
    model.add_layer(Convolution2D(D_batch_size=BATCH_SIZE, C_number_channels=16, NF_number_of_filters=32, H_height_input=14, W_width_input=14, optimizer=Optimizer.Adam))
    model.add_layer(ActivationLayer(ActivationFunction.ReLu, 0, convolutional_network=True))
    model.add_layer(DropoutLayer(0.2, 0, convolutional_network=True))
    
    # Block 3: input_shape=(BATCH_SIZE, 32, 14, 14) output_shape=(BATCH_SIZE, 48, 7, 7)
    model.add_layer(Convolution2D(D_batch_size=BATCH_SIZE, C_number_channels=32, NF_number_of_filters=48, H_height_input=14, W_width_input=14, optimizer=Optimizer.Adam))
    model.add_layer(ActivationLayer(ActivationFunction.ReLu, 0, convolutional_network=True))
    model.add_layer(MaxPoolingLayer2D(D_batch_size=BATCH_SIZE, PS_pool_size=2, S_stride=2, C_number_channels=48, H_height_input=14, W_width_input=14))
    model.add_layer(DropoutLayer(0.2, 0, convolutional_network=True))
    
    # Block 4: input_shape=(BATCH_SIZE, 48, 7, 7) output_shape=(BATCH_SIZE, 48 * 7 * 7)
    model.add_layer(FlattenLayer(D_batch_size=BATCH_SIZE, C_number_channels=48, H_height_input=7, W_width_input=7))
    
    # Block 5: input_shape=(BATCH_SIZE, 48 * 7 * 7) output_shape=(BATCH_SIZE, 10)
    model.add_layer(FCLayer(48 * 7 * 7, 10, optimizer=Optimizer.Adam, convolutional_network=True))
    model.add_layer(ActivationLayer(ActivationFunction.softmax, 10, convolutional_network=True))
    
  • 0.44% error rate
  • 150 epochs (early stopping after 48)
  • Tunable learning rate scheduler (starting learning rate of 0.001)

Eleventh model (00.42% error rate)

b91ea97

  • Mini batch gradient descent (batch size of 16)
  • Cross entropy loss function
  • Softmax activation function on last layer
  • Adam optimizer
  • Dropout layers
  • (Default) Data augmentation (0.5 Chance to do so)
  • Early stopping (min relative delta 0.005 and patience of 25)
  • He weight initialization
  • 4 2D convolutional layers
  • Model layout:
    # Block 1: input_shape=(BATCH_SIZE, 1, 28, 28) output_shape=(BATCH_SIZE, 16, 14, 14)
    model.add_layer(
        Convolution2D(D_batch_size=BATCH_SIZE, C_number_channels=1, NF_number_of_filters=16, H_height_input=28,
                      W_width_input=28, optimizer=optimizer))
    model.add_layer(ActivationLayer(ActivationFunction.ReLu, 0, convolutional_network=True))
    model.add_layer(DropoutLayer(0.2, 0, convolutional_network=True))
    
    # Block 2: input_shape=(BATCH_SIZE, 16, 28, 28) output_shape=(BATCH_SIZE, 32, 7, 7)
    model.add_layer(
        Convolution2D(D_batch_size=BATCH_SIZE, C_number_channels=16, NF_number_of_filters=32, H_height_input=28,
                      W_width_input=28, optimizer=optimizer))
    model.add_layer(ActivationLayer(ActivationFunction.ReLu, 0, convolutional_network=True))
    model.add_layer(MaxPoolingLayer2D(D_batch_size=BATCH_SIZE, PS_pool_size=2, S_stride=2, C_number_channels=32,
                                      H_height_input=28, W_width_input=28))
    model.add_layer(DropoutLayer(0.2, 0, convolutional_network=True))
    
    # Block 3: input_shape=(BATCH_SIZE, 32, 14, 14) output_shape=(BATCH_SIZE, 48, 14, 14)
    model.add_layer(
        Convolution2D(D_batch_size=BATCH_SIZE, C_number_channels=32, NF_number_of_filters=48, H_height_input=14,
                      W_width_input=14, optimizer=optimizer))
    model.add_layer(ActivationLayer(ActivationFunction.ReLu, 0, convolutional_network=True))
    model.add_layer(DropoutLayer(0.2, 0, convolutional_network=True))
    
    # Block 4: input_shape=(BATCH_SIZE, 48, 14, 14) output_shape=(BATCH_SIZE, 64, 7, 7)
    model.add_layer(
        Convolution2D(D_batch_size=BATCH_SIZE, C_number_channels=48, NF_number_of_filters=64, H_height_input=14,
                      W_width_input=14, optimizer=optimizer))
    model.add_layer(ActivationLayer(ActivationFunction.ReLu, 0, convolutional_network=True))
    model.add_layer(MaxPoolingLayer2D(D_batch_size=BATCH_SIZE, PS_pool_size=2, S_stride=2, C_number_channels=64,
                                      H_height_input=14, W_width_input=14))
    model.add_layer(DropoutLayer(0.2, 0, convolutional_network=True))
    
    # Block 5: input_shape=(BATCH_SIZE, 64, 7, 7) output_shape=(BATCH_SIZE, 64 * 7 * 7)
    model.add_layer(FlattenLayer(D_batch_size=BATCH_SIZE, C_number_channels=64, H_height_input=7, W_width_input=7))
    
    # Block 6: input_shape=(BATCH_SIZE, 64 * 7 * 7) output_shape=(BATCH_SIZE, 10)
    model.add_layer(FCLayer(64 * 7 * 7, 10, optimizer=optimizer, convolutional_network=True))
    model.add_layer(ActivationLayer(ActivationFunction.softmax, 10, convolutional_network=True))
    
  • 0.42% error rate
  • 150 epochs (early stopping after 70)
  • Tunable learning rate scheduler (starting learning rate of 0.001). Halve after every 5 epochs

Twelfth model (00.40% error rate)

403d08d

  • Mini batch gradient descent (batch size of 16)
  • Cross entropy loss function
  • Softmax activation function on last layer
  • Adam optimizer
  • Dropout layers
  • (Default) Data augmentation (0.8 Chance to do so)
  • Early stopping (min relative delta 0.005 and patience of 15)
  • He weight initialization
  • 4 2D convolutional layers
  • Model layout:
    # Block 1: input_shape=(BATCH_SIZE, 1, 28, 28) output_shape=(BATCH_SIZE, 16, 14, 14)
    model.add_layer(
        Convolution2D(D_batch_size=BATCH_SIZE, C_number_channels=1, NF_number_of_filters=16, H_height_input=28,
                      W_width_input=28, optimizer=optimizer))
    model.add_layer(ActivationLayer(ActivationFunction.ReLu, 0, convolutional_network=True))
    model.add_layer(DropoutLayer(0.2, 0, convolutional_network=True))
    
    # Block 2: input_shape=(BATCH_SIZE, 16, 28, 28) output_shape=(BATCH_SIZE, 32, 7, 7)
    model.add_layer(
        Convolution2D(D_batch_size=BATCH_SIZE, C_number_channels=16, NF_number_of_filters=32, H_height_input=28,
                      W_width_input=28, optimizer=optimizer))
    model.add_layer(ActivationLayer(ActivationFunction.ReLu, 0, convolutional_network=True))
    model.add_layer(MaxPoolingLayer2D(D_batch_size=BATCH_SIZE, PS_pool_size=2, S_stride=2, C_number_channels=32,
                                      H_height_input=28, W_width_input=28))
    model.add_layer(DropoutLayer(0.2, 0, convolutional_network=True))
    
    # Block 3: input_shape=(BATCH_SIZE, 32, 14, 14) output_shape=(BATCH_SIZE, 48, 14, 14)
    model.add_layer(
        Convolution2D(D_batch_size=BATCH_SIZE, C_number_channels=32, NF_number_of_filters=48, H_height_input=14,
                      W_width_input=14, optimizer=optimizer))
    model.add_layer(ActivationLayer(ActivationFunction.ReLu, 0, convolutional_network=True))
    model.add_layer(DropoutLayer(0.2, 0, convolutional_network=True))
    
    # Block 4: input_shape=(BATCH_SIZE, 48, 14, 14) output_shape=(BATCH_SIZE, 64, 7, 7)
    model.add_layer(
        Convolution2D(D_batch_size=BATCH_SIZE, C_number_channels=48, NF_number_of_filters=64, H_height_input=14,
                      W_width_input=14, optimizer=optimizer))
    model.add_layer(ActivationLayer(ActivationFunction.ReLu, 0, convolutional_network=True))
    model.add_layer(MaxPoolingLayer2D(D_batch_size=BATCH_SIZE, PS_pool_size=2, S_stride=2, C_number_channels=64,
                                      H_height_input=14, W_width_input=14))
    model.add_layer(DropoutLayer(0.2, 0, convolutional_network=True))
    
    # Block 5: input_shape=(BATCH_SIZE, 64, 7, 7) output_shape=(BATCH_SIZE, 64 * 7 * 7)
    model.add_layer(FlattenLayer(D_batch_size=BATCH_SIZE, C_number_channels=64, H_height_input=7, W_width_input=7))
    
    # Block 6: input_shape=(BATCH_SIZE, 64 * 7 * 7) output_shape=(BATCH_SIZE, 10)
    model.add_layer(FCLayer(64 * 7 * 7, 10, optimizer=optimizer, convolutional_network=True))
    model.add_layer(ActivationLayer(ActivationFunction.softmax, 10, convolutional_network=True))
    
  • 0.40% error rate
  • 150 epochs (early stopping after 42)
  • Tunable learning rate scheduler (starting learning rate of 0.001). Halve after every 5 epochs (and every 3 epochs after the 20th epoch)

Running a model

To start up the application, one will have to install the dependencies first. uv is recommended to be installed. An installation guide can be found here. If pipx is already installed on the machine, it is as easy as

pipx install uv

After having installed uv, to create a venv and install the necessary dependencies, run:

uv python install
uv sync --all-extras --dev

The above will install all dependencies. To finish the setup of the python environment, please also run:

set -a
source .env

Now the project could be run with

uv run src/main.py

However, the project uses poethepoet as a task runner. To install poethepoet, run with pipx installed

pipx install poethepoet

Now the application can be started by running

poe run

To run a specific model, click on the link provided below the model in this README, and download the source code of that specific commit and proceed as described above.

For some models, a pre-trained model is provided in the models directory is provided. This is either a zipped model which has to be extracted or a .pkl file, whose name can be provided at line 61 in main.py as the model_to_load variable.

About

Deep convolutional neural network implementation from scratch using Python and NumPy to classify the MNIST dataset

Topics

Resources

Stars

Watchers

Forks