A simple OCR project using the EMNIST (Cohen et al., 2017) dataset.
Run the notebook demo.ipynb
to train the classifier and show the demo.
The pipeline works as follows:
- The raw image is denoised and binarized.
- Single contours are detected with OpenCV and the corresponding regions of interest (ROI) are extracted and converted into 28x28 grayscale images.
- The ROIs are fed into a CNN that has been trained on a subset of the EMNIST dataset.
The classifier is a convolutional neural network and a variation of the LeNet (LeCun et al.,1998) architecture. For now, it is trained on the upper case letters A-Z of the "By_Class" split of the EMNIST dataset.