This is the repo to host the dataset BTS from the following paper:
Xixi Xu, Zhongang Qi, Jianqi Ma, Honglun Zhang, Ying Shan, Xiaohu Qie, BTS: A Bi-lingual Benchmark for Text Segmentation in the Wild
Summary of license permissions:
Our dataset is now fully released for academic use. The researcher shall use the BTS dataset only for non-commercial algorithm research and educational purposes. Except for the above purposes, the researcher may not use the BTS dataset for any other purposes, including but not limited to distribution, commercial use, advertising, etc.
You can download the dataset from the following link, only if you agree to the above permissions.
https://drive.weixin.qq.com/s?k=AJEAIQdfAAofh5N4rQ
The key motivation of the selection of scenes is to ensure the representation and generalization of the dataset.
- First, we have images indoor and outdoor to balance the lighting conditions.
- Second, the text line appearance variety is also an important factor to be considered, i.e., text line in different orientation (vertical and horizontal text in couplets and textbooks) and curve-shaped (some of the signboards).
- The third factor lies in the font diversity, e.g., we have text images in printed font in textbook and artistic font on the signboard.
We believe that varieties in these three perspectives can ensure the segmentation model to be well-trained with better generalization.
BTS eliminate algorithms and out-of-the-box models for the labeling process to prevent some bad labeling cases. The annotation workflow is as follows.
- Images cleaning. Unqualified examples such as fuzzy images with unrecognizable characters and strokes will be filtered out.
- Manual annotation. All the images in BTS are manually annotated by humans in three levels, including the pixel-level, the character-level, and the line-level annotations. PhotoShop is the main tool. The pencil tool in Photoshop is utilized to assist the annotators to label pixel-level mask annotations for texts.
- Two rounds of quality checks. During the labeling process, annotators will cross check the annotations from each other; after the labeling process, several professional researchers will double check the annotations.
The designed workflow ensures all annotations to be made in relatively high quality and benchmark to be highly-reliable.
BTS contains 14250 images.
The distribution is nearly balanced, which is consistent with real-world distribution.
A full download should contain these files:
BTS_VAL.zip
contains 10,188 images.BTS_TRAIN.zip
contains 2,696 images.BTS_TEST.zip
contains 1,366 images.
In each zip packages, there are three folds.
-
image
contains original images.[SceneID]_[SampleID].jpg
-
bpoly_label
word-level and char-level labels corresponding to the images.[SceneID]_[SampleID]_anno.json
-
semantic_label
mask labels corresponding to the images.[SceneID]_[SampleID]_maskfg.png
In this table, we compare BTS with a variety of representative datasets.
Dataset | Text Type | Images | Words | Chars | Masks | Char Classes | Language |
ICDAR13 FST | Scene | 462 | 1944 | 6620 | Word,Char | 36 | English |
COCO_TS | Scene | 14690 | 139034 | - | Word | 36 | English |
MLT_S | Scene | 6896 | 30691 | - | Word | 36 | English |
Total-Text | Scene | 1555 | 9330 | - | Word | 36 | English |
TextSeg | Scene+Design | 4024 | 15691 | 73790 | Word,Word-Effect,Char | 36 | English |
BTS(Ours) | Scene | 14250 | 44280 | 209090 | Word,Char | 3985 | Bi-lingual |