Skip to content

Latest commit

 

History

History
45 lines (35 loc) · 2.03 KB

data_preparation.md

File metadata and controls

45 lines (35 loc) · 2.03 KB

Data Preparation

Step1. Download

example

To training Hi-SAM for hierarchical text segmentation, download the training gt json file, which is derived from the gt in HierText repo by using HierText/process_gt.py.

Step2. Process & Organization

(1) For Total-Text, rename groundtruth_pixel/Train/img61.JPG to groundtruth_pixel/Train/img61.jpg .

(2) For TextSeg, see TextSeg/process_textseg.py and use it to split the original data.

(3) Organize the datasets as the following structure:

|- HierText
|  |- train
|  |- train_gt
|  |- validation
|  |- validation_gt
|  |- test
|  |- test_gt
|  └  train_shrink_vert.json
|- TotalText
|  |- groundtruth_pixel
|     |- Test
|     └  Train
|  └  Images
|     |- Test
|     └  Train
|- TextSeg
|  |- train_images
|  |- train_gt
|  |- val_images
|  |- val_gt
|  |- test_images
|  └  test_gt