This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving no post-process except for a standard nonmaximum suppression. For more details, please refer to our paper.
Please cite TextBoxes in your publications if it helps your research:
@inproceedings{LiaoSBWL17,
author = {Minghui Liao and
Baoguang Shi and
Xiang Bai and
Xinggang Wang and
Wenyu Liu},
title = {TextBoxes: {A} Fast Text Detector with a Single Deep Neural Network},
booktitle = {AAAI},
year = {2017}
}
- Get the code. We will call the directory that you cloned Caffe into
$CAFFE_ROOT
git clone https://github.com/MhLiao/TextBoxes.git
cd TextBoxes
make -j8
make py
- Models trained on ICDAR 2013: Dropbox link BaiduYun link
- Fully convolutional reduced (atrous) VGGNet: Dropbox link BaiduYun link
- Compiled mex file for evaluation(for multi-scale test evaluation: evaluation_nms.m): Dropbox link BaiduYun link
- Download the ICDAR 2013 DataSet
- Download the Models trained on ICDAR 2013
- Modify the related paths in the "examples/TextBoxes/test_icdar13.py"
- run "python examples/test_icdar13.py"
- To multi-scale test, you should use "test_icdar13_multi_scale.py" and "evaluation_nms.m"
- Train about 50k iterions on Synthetic data which refered in the paper.
- Train about 2k iterions on corresponding training data such as ICDAR 2013 and SVT.
- For more information, such as learning rate setting, please refer to the paper.
- Using the given test code, you can achieve an F-measure of about 80% on ICDAR 2013 with a single scale.
- Using the given multi-scale test code, you can achieve an F-measure of about 85% on ICDAR 2013 with a non-maximum suppression.
- More performance information, please refer to the paper and Task1 and Task4 of Challenge2 on the ICDAR 2015 website: http://rrc.cvc.uab.es/?ch=2&com=evaluation
Please let me know if you encounter any issues.