HolySheet

Word crawler for ancient documents. This software provides methods for binarization and word segmentation of ancient document, like Genesis (from Holy Bible). In particular it can:

Elaborate .png scans of ancient document with python libraries like openCV
Get word segmentation, using histogram pixel techniques and specific heuristics
Prepare a dataset for Detectron neural network

To achieve that we process an image in several ways: first, we perform a rotation in order to straighten it up. Through OpenCV's threshold() method, our image becomes binarized.

Original Genesis image and his binarization.

Then, we make a histogram so that we can obtain a segmentation of the single line.

Line segmentation.

Finally, we repeat the histogram and we affine the results through another method call, calimero(), which helps us spotting the periods and the full stops. This way we obtain the single word segmentation.

Word segmentation.

Moreover the software can split the images into sub-images and generate the annotations for them, making them suitable for a COCO-based neural network.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.idea		.idea
JsonUtils		JsonUtils
analytics		analytics
demoImages		demoImages
frequentWords		frequentWords
results		results
README.md		README.md
angles_34-62.json		angles_34-62.json
annotationsTry.json		annotationsTry.json
binarizer.py		binarizer.py
genesis1-20.txt		genesis1-20.txt
imageProcessingDemo.py		imageProcessingDemo.py
inPagePositions.json		inPagePositions.json
instances_COCOGenesis.json		instances_COCOGenesis.json
results.py		results.py
stringUtils.py		stringUtils.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HolySheet

About

Releases

Packages

Contributors 2

Languages

fmalato/HolySheet

Folders and files

Latest commit

History

Repository files navigation

HolySheet

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages