Skew Correction, Text Inversion, Rotation Classification, Homography & Object Search with Applied Math
Text Inversion To identify inversion of text font from images is a daunting task. Inversion can happen when the document is scanned upside-down. The document can become inverted, even after rotation or skew correction, as rotation of 90+Θ is detected as 90-Θ, or -90-Θ as -90+Θ. Thus, text inversion is a common problem, but is challenging to recognize.
Lets see how to mathematically formulate inversion.
Method 1: Double Peaks
- Project the pixels on y-axis. Each line would result in a peak, in fact 2 peaks due to the shape of English character.
- Convolve with a Gaussian filter to smooth the noise.
- Calculate the fraction of peaks (lines) with sub-peaks on the right side.
This will not work if the text is in CAPITAL letters or in some other language, as the "double peak logic" would likely falter.
Another numerical way to address the problem is to make use of the font shape, such as 'Water Fill Technique' or to mathematically represent the character shape, as given below. **We can describe any shape mathematically using shape context and log-bin histograms. **
Method 2: Shape Contexts using Log-Bin Histograms
- Find text bounding boxes from images using EAST. {below}
- Crop image inside bounding box and apply Canny edge detection.
- Take a dummy image with alphanumeric as base input. Find bounding boxes around each character in base input and image from step (b). Do steps {d}-{h} to find best correspondence between character pairs.
- Randomly sample N points from edge elements of each character shape.
- Construct a new shape descriptor - shape context. The shape context at a point captures the distribution over relative positions of other shape points and thus summarizes global shape.
- Compare the log-polar histograms using Pearson's chi-squared test or cosine distance.
- Find the numeral with minimum distance for each bounding box in base image. Sum up the cost values of each bounding box to find Sigma( Φ).
7. Invert cropped image from step (b) and do steps {d}-{h} to **compute Sigma( Φ'). **Compare the Sigma values to know text inversion.
EAST (An Efficient and Accurate Scene Text Detector)
The textual content inside an image can be localized using EAST algorithm.
Here again, we can use a math-hack to localize text in an image, instead of using AI-based EAST algorithm. You can find consecutive local minima of y-projections of pixels to find consecutive trough that corresponds to line separation in an image. Once a line is found, you can run method 2, starting from step (b).
The above method would work irrespective of font case or language.
Skew Correction
Most of the scanned documents are skewed. Thus, it is required to de-skew the image before feeding an OCR or even to display.
Method 1: Iterative Projection
- Rotate the image from -10 degrees to +10 degrees.
- Compute projection of all pixels on y-axis.
- Calculate the pixel incidence density.
- Step the rotation angle by 0.5 angles and repeat steps 2, 3
- Find the angle Θ with maximum pixel incidence density.
The drawbacks of the above algorithm are:
a. Iterative computation increases time complexity.
b. Potential error of 0.5 degrees due to step size.
Mostly, scanned document would be of form format or tabular data containing lines or point spread of lines (lines can be disjoint in scanned image, due to lack of scan or print quality). Hence, the question boils down to "whether we can compute the line and Θ, given a point spread as input?"
Method 2: Hough Transform Peak
- Read the skewed image and do Canny Edge detection
- Hough Space = Call Hough_Transform (Edge Detected Image)
- Find the maxima in Hough space transform (accumulator matrix)
- Find Θ of the significant lines using tangent of slope
- Calculate median of slopes, Θ'
- Rotate the image by Θ'
*Skew Correction Functional Workflow*
Rotation Classification
Rotation is a common problem in scanned images. The document can be rotated 90° or more, while being scanned.
You can use the above skew correction code to find Θ and rotate. The only drawback is, rotation of 90+Θ could be detected as 90-Θ, and -90-Θ as -90+Θ. Hence, the image can get flipped, once you rotate!
To solve the above problem, just pass the de-skewed image to the text inversion code and flip it upright, if deemed necessary.
Homography
Let's say you want to find an object (template) inside a bigger image with multiple objects. We can use Object detection models like SSD or YOLO with annotated Query Images to train different classes of objects to be found. But how do we use simple math to find and locate an object in a bigger image?
We can use homography to find point correspondences and transform the coordinates from one perspective to another. Homography is a transformation ( 3×3 matrix ) that maps the points in one image to the corresponding points in the other image.
These are the steps you can follow.
- Firstly, open the template image and the image to be matched.
- Find all features from both input images.
- Create an ORB keypoint detector which is less compute intensive than SIFT and SURF.
- Find the key points and their descriptors with the orb detector.
- Create matches of descriptors, then sort them based on distances.
- Use cv2.drawMatchesKnn to draw all the k best matches.
- Extract the matched keypoints from both images.
- Find homography matrix and do perspective transform
Object Search
Let's say, you need to find an object from a set of images. You can use an AI model, as it is a classic case of image classification. But, can we use traditional math to do this? Here's how…
- Read image of the object to search (Query Image)
- Do Canny edge detection and find bounding box around contour.
- Randomly sample 'n' random points to describe the shape inside image.
- Iterate and get all images inside the input folder.
- Do steps 2 & 3 on each image.
- **Compute the correlation value of random shape points of 'Query Image' with shape points of each image in the folder. **
- Find the image with minimum correlation value. This image contains the nearest match of the object you are searching for.
Above equation conceptually formulates correlation as the similarity in deviation around mean. Thus, numerator signifies distribution similarity and denominator quantifies L2-norm for normalization.
Input Images and Compare Value
Please note that a different car (purple) with similar shape has the second nearest match value, right after the red car. The correlation distance to other shapes are distinctively more. Thus you can see shape matching is functional.
Please note the correlation values will not be 0, even for same images, as random sampling of points is done to describe shapes. There are other ways to describe shapes without random sampling but time complexity of shape matching would become an order higher. One such method, known as Turning Function, is depicted below.
References
[1] Inversion Detection in Text Document Images. Hamid Pilevar, A. G. Ramakrishnan, Medical Intelligence and Language Engineering Lab, Department of Electrical Engineering, Indian Institute of Science, Bangalore (JCIS 2006)
[2] Shape Context: A new descriptor for shape matching and object recognition. Serge Belongie, Jitendra Malik and Jan Puzicha. Department of Electrical Engineering and Computer Sciences, University of California at Berkeley (NIPS 2000)
[3] Shape Matching and Object Recognition Using Shape Contexts. Serge Belongie, Jitendra Malik and Jan Puzicha. Computer Science Division, University of California at Berkeley (PAMI 2002)