4,995 Vietnamese OCR Images Data - Images with Annotation and Transcription. The data includes 258 images of natural scenes, 2,553 Internet images, 2,184 document images. For line-level content annotation, line-level quadrilateral bounding box annotation and test transcription was adpoted; for column-level content annotation, column-level quadrilateral bounding box annotation and text transcription was adpoted. The data can be used for tasks such as Vietnamese recognition in multiple scenes.
For more details, please refer to the link: https://www.nexdata.ai/datasets/ocr/1059?source=Github
4,995 OCR images, including 258 images of natural scenes, 2,553 Internet images, 2,184 document images
including natural scenes (plaque, packaging instructions, small advertisements, menus, posters, etc.), Internet images (magazine covers, comic covers, etc.), document images (text documents, etc.)
including multiple scenes, multiple angles, different light conditions
cellphone
looking up angle, eye-level angle
the image data format is .jpg, the annotated file format is .json
line-level quadrilateral bounding box annotation and transcription for the texts; column-level quadrilateral bounding box annotation and transcription for the texts
the error bound of each vertex of quadrilateral bounding box is within 10 pixels, which is a qualified annotation, the accuracy of bounding boxes is not less than 97%; the texts transcription accuracy is not less than 97%
Commercial License