Skip to content

Latest commit

 

History

History
executable file
·
206 lines (151 loc) · 13.2 KB

algorithm_overview_en.md

File metadata and controls

executable file
·
206 lines (151 loc) · 13.2 KB

Algorithms

This tutorial lists the OCR algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on English public datasets. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to PP-OCRv3 models list.

Developers are welcome to contribute more algorithms! Please refer to add new algorithm guideline.

1. Two-stage OCR Algorithms

1.1 Text Detection Algorithms

Supported text detection algorithms (Click the link to get the tutorial):

On the ICDAR2015 dataset, the text detection result is as follows:

Model Backbone Precision Recall Hmean Download link
EAST ResNet50_vd 88.71% 81.36% 84.88% trained model
EAST MobileNetV3 78.20% 79.10% 78.65% trained model
DB ResNet50_vd 86.41% 78.72% 82.38% trained model
DB MobileNetV3 77.29% 73.08% 75.12% trained model
SAST ResNet50_vd 91.39% 83.77% 87.42% trained model
PSE ResNet50_vd 85.81% 79.53% 82.55% trained model
PSE MobileNetV3 82.20% 70.48% 75.89% trained model
DB++ ResNet50 90.89% 82.66% 86.58% pretrained model/trained model

On Total-Text dataset, the text detection result is as follows:

Model Backbone Precision Recall Hmean Download link
SAST ResNet50_vd 89.63% 78.44% 83.66% trained model
CT ResNet18_vd 88.68% 81.70% 85.05% trained model

On CTW1500 dataset, the text detection result is as follows:

Model Backbone Precision Recall Hmean Download link
FCE ResNet50_dcn 88.39% 82.18% 85.27% trained model
DRRG ResNet50_vd 89.92% 80.91% 85.18% trained model

Note: Additional data, like icdar2013, icdar2017, COCO-Text, ArT, was added to the model training of SAST. Download English public dataset in organized format used by PaddleOCR from:

1.2 Text Recognition Algorithms

Supported text recognition algorithms (Click the link to get the tutorial):

Refer to DTRB, the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:

Model Backbone Avg Accuracy Module combination Download link
Rosetta Resnet34_vd 79.11% rec_r34_vd_none_none_ctc trained model
Rosetta MobileNetV3 75.80% rec_mv3_none_none_ctc trained model
CRNN Resnet34_vd 81.04% rec_r34_vd_none_bilstm_ctc trained model
CRNN MobileNetV3 77.95% rec_mv3_none_bilstm_ctc trained model
StarNet Resnet34_vd 82.85% rec_r34_vd_tps_bilstm_ctc trained model
StarNet MobileNetV3 79.28% rec_mv3_tps_bilstm_ctc trained model
RARE Resnet34_vd 83.98% rec_r34_vd_tps_bilstm_att trained model
RARE MobileNetV3 81.76% rec_mv3_tps_bilstm_att trained model
SRN Resnet50_vd_fpn 86.31% rec_r50fpn_vd_none_srn trained model
NRTR NRTR_MTB 84.21% rec_mtb_nrtr trained model
SAR Resnet31 87.20% rec_r31_sar trained model
SEED Aster_Resnet 85.35% rec_resnet_stn_bilstm_att trained model
SVTR SVTR-Tiny 89.25% rec_svtr_tiny_none_ctc_en trained model
ViTSTR ViTSTR 79.82% rec_vitstr_none_ce trained model
ABINet Resnet45 90.75% rec_r45_abinet trained model
VisionLAN Resnet45 90.30% rec_r45_visionlan trained model
SPIN ResNet32 90.00% rec_r32_gaspin_bilstm_att trained model
RobustScanner ResNet31 87.77% rec_r31_robustscanner trained model
RFL ResNetRFL 88.63% rec_resnet_rfl_att trained model
ParseQ VIT 91.24% rec_vit_parseq_synth trained model
CPPD SVTR-Base 93.8% rec_svtrnet_cppd_base_en trained model
SATRN ShallowCNN 88.05% rec_satrn trained model

1.3 Text Super-Resolution Algorithms

Supported text super-resolution algorithms (Click the link to get the tutorial):

On the TextZoom public dataset, the effect of the algorithm is as follows:

Model Backbone PSNR_Avg SSIM_Avg Config Download link
Text Gestalt tsrn 19.28 0.6560 configs/sr/sr_tsrn_transformer_strock.yml trained model
Text Telescope tbsrn 21.56 0.7411 configs/sr/sr_telescope.yml trained model

1.4 Formula Recognition Algorithm

Supported formula recognition algorithms (Click the link to get the tutorial):

On the CROHME handwritten formula dataset, the effect of the algorithm is as follows:

Model Backbone Config ExpRate Download link
CAN DenseNet rec_d28_can.yml 51.72% trained model

On the LaTeX-OCR printed formula dataset, the effect of the algorithm is as follows:

Model Backbone config BLEU score normed edit distance ExpRate Download link
LaTeX-OCR Hybrid ViT rec_latex_ocr.yml 0.8821 0.0823 40.01% trained model

2. End-to-end OCR Algorithms

Supported end-to-end algorithms (Click the link to get the tutorial):

3. Table Recognition Algorithms

Supported table recognition algorithms (Click the link to get the tutorial):

On the PubTabNet dataset, the algorithm result is as follows:

Model Backbone Config Acc Download link
TableMaster TableResNetExtra configs/table/table_master.yml 77.47% trained model / inference model

4. Key Information Extraction Algorithms

Supported KIE algorithms (Click the link to get the tutorial):

On wildreceipt dataset, the algorithm result is as follows:

Model Backbone Config Hmean Download link
SDMGR VGG6 configs/kie/sdmgr/kie_unet_sdmgr.yml 86.70% trained model

On XFUND_zh dataset, the algorithm result is as follows:

Model Backbone Task Config Hmean Download link
VI-LayoutXLM VI-LayoutXLM-base SER ser_vi_layoutxlm_xfund_zh_udml.yml 93.19% trained model
LayoutXLM LayoutXLM-base SER ser_layoutxlm_xfund_zh.yml 90.38% trained model
LayoutLM LayoutLM-base SER ser_layoutlm_xfund_zh.yml 77.31% trained model
LayoutLMv2 LayoutLMv2-base SER ser_layoutlmv2_xfund_zh.yml 85.44% trained model
VI-LayoutXLM VI-LayoutXLM-base RE re_vi_layoutxlm_xfund_zh_udml.yml 83.92% trained model
LayoutXLM LayoutXLM-base RE re_layoutxlm_xfund_zh.yml 74.83% trained model
LayoutLMv2 LayoutLMv2-base RE re_layoutlmv2_xfund_zh.yml 67.77% trained model