ICDAR 2015 Official Website | Download Link
Note: Please register an account to download this dataset.
ICDAR 2015 Challenge has three tasks. Task 1 is Text Localization. Task 3 is Word Recognition. Task 4 is End-to-end Text Spotting. Task 2 Text Segmentation is not available.
The four files downloaded for task 1 are
ch4_training_images.zip
ch4_training_localization_transcription_gt.zip
ch4_test_images.zip
Challenge4_Test_Task1_GT.zip
The three files downloaded for task 3 are
ch4_training_word_images_gt.zip
ch4_test_word_images_gt.zip
Challenge4_Test_Task3_GT.txt
The three files are only needed for training word recognition models. Training text detection models does not require the three files.
The nine files downloaded for task 4 are the union of the four files in the text localization task (task 1) and five vocabulary files
ch4_training_vocabulary.txt
ch4_training_vocabularies_per_image.zip
ch4_test_vocabulary.txt
ch4_test_vocabularies_per_image.zip
GenericVocabulary.txt
If you download a file named Challenge4_Test_Task4_GT.zip
, please note that it is the same file as Challenge4_Test_Task1_GT.zip
, except for its name. In this repository, we will use Challenge4_Test_Task4_GT.zip
for ICDAR2015 dataset.
After downloading the icdar2015 dataset, place all the files under [path-to-data-dir]
folder:
path-to-data-dir/
ic15/
ch4_test_images.zip
ch4_test_vocabularies_per_image.zip
ch4_test_vocabulary.txt
ch4_training_images.zip
ch4_training_localization_transcription_gt.zip
ch4_training_vocabularies_per_image.zip
ch4_training_vocabulary.txt
Challenge4_Test_Task4_GT.zip
GenericVocabulary.txt
ch4_test_word_images_gt.zip
ch4_training_word_images_gt.zip
Challenge4_Test_Task3_GT.zip