Skip to content

ngawangtrinley/ocr-tests

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 

Repository files navigation

ocr-tests

Test 1:

The diff between the actual text (left) and the OCR result (json) of this image (right)...

<iframe src="http://prose.io/#ngawangtrinley/starter"></iframe>

https://github.com/ngawangtrinley/ocr-tests/compare/f0035c4...3baffd5

...highlights several types of issues:

  • '༥' 0f25, at the end of the header wasn't detected, but somehow an extra '།' 0f0d appeared at the end of the text
  • '࿒' 0FD2 is replaced by a ':' 003a at the start of lines, and by '་' 0f0b in the middle of lines
  • 'ཿ' 0f7f are ignored
  • Tibetan enclosed alphanumerics (replaced by ①...) aren't detected at all. The reason most probably being that these aren't part of the Tibetan Unicode table
  • a '་' 0f0b has been added between two sentences in line 18, most probably from the text on the backside of the page.
  • the remaining issues are letter combinations used in transliterating sanskrit (very common in buddhist literature) and that might not have featured in training data.

About

OCR test files

Resources

Stars

Watchers

Forks

Packages

No packages published