A DNN (UNet) module that performs deskewing, auto-alignment and cropping of documents from an image
The architecture was inspired by U-Net: Convolutional Networks for Biomedical Image Segmentation.
A small application built on top of U-Net segmentation architecture, that segments documents from images, and performs perspective transform on it.
Running the document through the model yields a mask of the document.
The mask is the preprocessed, to further isolate the document from the background.
The Canny edge detector is used to detect the edges of the document in the mask, and then the optimum contour is identified.
Perspective transform is used on the original image with the 4 corner points obtained from the contours.
This is an improvement over the previous model, where direct canny is applied without image segmentation. It had very poor accuracy, in terms of detecting the edges of the document.
The images are 3-D volume tiff, you should transfer the stacks into images first.
The data for training contains 30 256*256 images.
Create the following directory structure: |-data
|- npydata
|- train
|- image
|- label
|- test
|- results
|- static
|- FinalTransformedDoc
|- templates
|- uploads
This deep neural network is implemented with Keras functional API, which makes it extremely easy to experiment with different interesting architectures.
The output from the network is a 256*256 which represents a mask that should be learned. Sigmoid activation function makes sure that mask pixels are in [0, 1] range.
The model is trained for 10 epochs.
After 10 epochs, the calculated accuracy is about 0.91.
The loss function for the training is basically just a binary cross-entropy
This tutorial depends on the following libraries:
- Tensorflow
- Keras >= 1.0
- libtiff(optional)
- OpenCv
- Numpy
- OS
Also, this code should be compatible with Python versions 2.7-3.6.
First transfer 3D volume tiff to 30 256*256 images.
To do so, run python compress.py
, providing the right input and output directory.
The labels have to black-n-white masks, 256*256, named serially from 0.
- Check out
get_unet()
inunet.py
to modify the model, optimizer and loss function.
- Run
python train.py
to train the model.
After this script finishes, in imgs_mask_test.npy
masks for corresponding images in imgs_test.npy
should be generated. I suggest you examine these masks for getting further insight into your model's performance.
- Run
python test.py
to get the masks of the images
After it's done, the resultant masks are saved in results
.
- Run
python document-edge-detect.py
The final document is saved in FinalTransformedDoc
.
If both mask generation and edge detection should happen together, uncomment the first 3 comments under the main function.
Give a ⭐️ if this project helped you!