Merge branch 'PaddlePaddle:main' into setup2toml

Liyulingyue · May 23, 2024 · 9e5e2c7 · 9e5e2c7
2 parents 9baed76 + e73eb76
commit 9e5e2c7
Show file tree

Hide file tree

Showing 26 changed files with 407 additions and 376 deletions.
diff --git a/.github/workflows/pre-commit.yml → .github/workflows/codestyle.yml b/.github/workflows/pre-commit.yml → .github/workflows/codestyle.yml
@@ -1,16 +1,18 @@
-name: pre-commit
+name: PaddleOCR Code Style Check
 
 on:
   pull_request:
   push:
     branches: ['main', 'release/*']
 
 jobs:
-  pre-commit:
+  check-code-style:
     runs-on: ubuntu-latest
     steps:
-    - uses: actions/checkout@v3
-    - uses: actions/setup-python@v3
+    - uses: actions/checkout@v4
+      with:
+        ref: ${{ github.ref }}
+    - uses: actions/setup-python@v5
       with:
         python-version: '3.10'
     # Install Dependencies for Python

diff --git a/.github/workflows/tests.yaml b/.github/workflows/tests.yaml
@@ -0,0 +1,30 @@
+name: PaddleOCR PR Tests
+
+on:
+  push:
+  pull_request:
+    branches: ["main", "release/*"]
+
+permissions:
+  contents: read
+
+jobs:
+  test-pr:
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v4
+    - name: Set up Python 3.10
+      uses: actions/setup-python@v5
+      with:
+        python-version: "3.10"
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install pytest
+        if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
+        pip install "paddlepaddle==2.5" requests
+        pip install -e .
+    - name: Test with pytest
+      run: |
+        pytest tests/
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -35,3 +35,16 @@ repos:
     hooks:
     -   id: black
         files: (.*\.(py|pyi|bzl)|BUILD|.*\.BUILD|WORKSPACE)$
+
+# Flake8
+-   repo: https://github.com/pycqa/flake8
+    rev: 7.0.0
+    hooks:
+    -   id: flake8
+        args:
+            - --count
+            - --select=E9,F63,F7,F82
+            - --show-source
+            - --statistics
+        exclude: ^benchmark/|^test_tipc/
+
diff --git a/README.md b/README.md
diff --git a/README_ch.md b/README_ch.md
diff --git a/README_en.md b/README_en.md
@@ -1,4 +1,4 @@
-English | [简体中文](README_ch.md) | [हिन्दी](./doc/doc_i18n/README_हिन्द.md) | [日本語](./doc/doc_i18n/README_日本語.md) | [한국인](./doc/doc_i18n/README_한국어.md) | [Pу́сский язы́к](./doc/doc_i18n/README_Ру́сский_язы́к.md)
+English | [简体中文](README.md) | [हिन्दी](./doc/doc_i18n/README_हिन्द.md) | [日本語](./doc/doc_i18n/README_日本語.md) | [한국인](./doc/doc_i18n/README_한국어.md) | [Pу́сский язы́к](./doc/doc_i18n/README_Ру́сский_язы́к.md)
 
 <p align="center">
  <img src="./doc/PaddleOCR_log.png" align="middle" width = "600"/>
@@ -25,6 +25,9 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools
     <img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00006737.jpg" width="800">
 </div>
 
+## 🚀 Community
+PaddleOCR is being oversight by a [PMC](https://github.com/PaddlePaddle/PaddleOCR/issues/12122). Issues and PRs will be reviewed on a best-effort basis. For a complete overview of PaddlePaddle community, please visit [community](https://github.com/PaddlePaddle/community).
+
 ## 📣 Recent updates
 - **🔥2023.8.7 Release PaddleOCR[release/2.7](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.7)**
   - Release [PP-OCRv4](./doc/doc_ch/PP-OCRv4_introduction.md), support mobile version and server version
@@ -56,7 +59,6 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools
 
 
 ## 🌟 Features
-
 PaddleOCR support a variety of cutting-edge algorithms related to OCR, and developed industrial featured models/solution [PP-OCR](./doc/doc_en/ppocr_introduction_en.md)、 [PP-Structure](./ppstructure/README.md) and [PP-ChatOCR](https://aistudio.baidu.com/aistudio/projectdetail/6488689) on this basis, and get through the whole process of data production, model training, compression, inference and deployment.
 
 <div align="center">
@@ -67,7 +69,6 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
 
 
 ## ⚡ Quick Experience
-
 - Web online experience
     - PP-OCRv4 online experience：https://aistudio.baidu.com/aistudio/projectdetail/6611435
     - PP-ChatOCR online experience：https://aistudio.baidu.com/aistudio/projectdetail/6488689
@@ -77,38 +78,14 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
     - PP-ChatOCR：https://aistudio.baidu.com/aistudio/modelsdetail?modelId=332
 - Mobile demo experience：[Installation DEMO](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)(Based on EasyEdge and Paddle-Lite, support iOS and Android systems)
 
-<a name="Technical exchange and cooperation"></a>
-
 ## 📖 Technical exchange and cooperation
-- ([PaddleX](http://10.136.157.23:8080/paddle/paddleX))provides a one-stop full-process high-efficiency development platform for flying paddle ecological model training, pressure, and push. Its mission is to help AI technology quickly land, and its vision is to make everyone an AI Developer!
+- PaddleX provides a one-stop full-process high-efficiency development platform for flying paddle ecological model training, pressure, and push. Its mission is to help AI technology quickly land, and its vision is to make everyone an AI Developer!
    - PaddleX currently covers areas such as image classification, object detection, image segmentation, 3D, OCR, and time series prediction, and has built-in 36 basic single models, such as RP-DETR, PP-YOLOE, PP-HGNet, PP-LCNet, PP- LiteSeg, etc.; integrated 12 practical industrial solutions, such as PP-OCRv4, PP-ChatOCR, PP-ShiTu, PP-TS, vehicle-mounted road waste detection, identification of prohibited wildlife products, etc.
    - PaddleX provides two AI development modes: "Toolbox" and "Developer". The toolbox mode can tune key hyperparameters without code, and the developer mode can perform single-model training, push and multi-model serial inference with low code, and supports both cloud and local terminals.
    - PaddleX also supports joint innovation and development, profit sharing! At present, PaddleX is rapidly iterating, and welcomes the participation of individual developers and enterprise developers to create a prosperous AI technology ecosystem!
 
-Scan the QR code below on WeChat to add operation students, and reply [paddlex], operation students will invite you to join the official communication group for more efficient questions and answers.
-
-<div align="center">
-<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/doc/joinus_paddlex.jpg"  width = "150" height = "150",caption='' />
-<p>[PaddleX] technology exchange group QR code</p>
-</div>
-
-<a name="book"></a>
 ## 📚 E-book: *Dive Into OCR*
-- [Dive Into OCR ](./doc/doc_en/ocr_book_en.md)
-
-<a name="Community"></a>
-
-## 👫 Community
-
-- For international developers, we regard [PaddleOCR Discussions](https://github.com/PaddlePaddle/PaddleOCR/discussions) as our international community platform. All ideas and questions can be discussed here in English.
-
-- For Chinese develops, Scan the QR code below with your Wechat, you can join the official technical discussion group. For richer community content, please refer to [中文README](README_ch.md), looking forward to your participation.
-
-<div align="center">
-<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/doc/joinus.PNG"  width = "150" height = "150" />
-</div>
-
-<a name="Supported-Chinese-model-list"></a>
+- [Dive Into OCR](./doc/doc_en/ocr_book_en.md)
 
 ## 🛠️ PP-OCR Series Model List（Update on September 8th）
 
@@ -122,7 +99,6 @@ Scan the QR code below on WeChat to add operation students, and reply [paddlex],
 - For a new language request, please refer to [Guideline for new language_requests](#language_requests).
 - For structural document analysis models, please refer to [PP-Structure models](./ppstructure/docs/models_list_en.md).
 
-<a name="tutorials"></a>
 ## 📖 Tutorials
 - [Environment Preparation](./doc/doc_en/environment_en.md)
 - [PP-OCR 🔥](./doc/doc_en/ppocr_introduction_en.md)
@@ -182,8 +158,6 @@ Scan the QR code below on WeChat to add operation students, and reply [paddlex],
 - [References](./doc/doc_en/reference_en.md)
 - [License](#LICENSE)
 
-
-<a name="Visualization"></a>
 ## 👀 Visualization [more](./doc/doc_en/visualization_en.md)
 
 <details open>
@@ -244,10 +218,6 @@ Scan the QR code below on WeChat to add operation students, and reply [paddlex],
 <div align="center">
     <img src="https://user-images.githubusercontent.com/14270174/185540080-0431e006-9235-4b6d-b63d-0b3c6e1de48f.jpg" width="600">
 </div>
-
-</details>
-
-<a name="language_requests"></a>
 ## 🇺🇳 Guideline for New Language Requests
 
 If you want to request a new language support, a PR with 1 following files are needed：
@@ -259,7 +229,5 @@ If your language has unique elements, please tell me in advance within any way,
 
 More details, please refer to [Multilingual OCR Development Plan](https://github.com/PaddlePaddle/PaddleOCR/issues/1048).
 
-
-<a name="LICENSE"></a>
 ## 📄 License
 This project is released under <a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>
diff --git a/benchmark/PaddleOCR_DBNet/data_loader/modules/augment.py b/benchmark/PaddleOCR_DBNet/data_loader/modules/augment.py
@@ -25,7 +25,7 @@ def __call__(self, data: dict):
             return data
         data["img"] = (
             random_noise(data["img"], mode="gaussian", clip=True) * 255
-        ).astype(im.dtype)
+        ).astype(data["img"].dtype)
         return data
 
 

diff --git a/configs/rec/PP-OCRv4/en_PP-OCRv4_rec.yml b/configs/rec/PP-OCRv4/en_PP-OCRv4_rec.yml
@@ -10,7 +10,7 @@ Global:
   - 0
   - 2000
   cal_metric_during_train: true
-  pretrained_model: refactor
+  pretrained_model: null
   checkpoints: null
   save_inference_dir: null
   use_visualdl: false

diff --git a/deploy/hubserving/kie_ser/module.py b/deploy/hubserving/kie_ser/module.py
@@ -142,7 +142,7 @@ def serving_method(self, images, **kwargs):
 
 
 if __name__ == "__main__":
-    ocr = OCRSystem()
+    ocr = KIESer()
     ocr._initialize()
     image_path = [
         "./doc/imgs/11.jpg",

diff --git a/deploy/hubserving/kie_ser_re/module.py b/deploy/hubserving/kie_ser_re/module.py
@@ -144,7 +144,7 @@ def serving_method(self, images, **kwargs):
 
 
 if __name__ == "__main__":
-    ocr = OCRSystem()
+    ocr = KIESerRE()
     ocr._initialize()
     image_path = [
         "./doc/imgs/11.jpg",

diff --git a/doc/doc_ch/ppocr_introduction.md b/doc/doc_ch/ppocr_introduction.md
@@ -11,8 +11,8 @@
     - [5.2 模型训练、压缩、推理部署](#52)
 - [6. 模型库](#6)
 
-
 <a name="1"></a>
+
 ## 1. 简介
 
 PP-OCR是PaddleOCR自研的实用的超轻量OCR系统。在实现[前沿算法](algorithm.md)的基础上，考虑精度与速度的平衡，进行**模型瘦身**和**深度优化**，使其尽可能满足产业落地需求。
@@ -109,7 +109,7 @@ PP-OCRv3系统pipeline如下：
 <a name="52"></a>
 ### 5.2 模型训练、压缩、推理部署
 
-更多教程，包括模型训练、模型压缩、推理部署等，请参考[文档教程](../../README_ch.md#文档教程)。
+更多教程，包括模型训练、模型压缩、推理部署等，请参考[文档教程](../../README.md#文档教程)。
 
 <a name="6"></a>
 ## 6. 模型库

diff --git a/doc/doc_i18n/README_Ру́сский_язы́к.md b/doc/doc_i18n/README_Ру́сский_язы́к.md
@@ -1,4 +1,4 @@
-[English](../../README.md) | [简体中文](../../README_ch.md) | [हिन्दी](./README_हिन्द.md) | [日本語](./README_日本語.md) | [한국인](./README_한국어.md) | Pу́сский язы́к
+[English](../../README_en.md) | [简体中文](../../README.md) | [हिन्दी](./README_हिन्द.md) | [日本語](./README_日本語.md) | [한국인](./README_한국어.md) | Pу́сский язы́к
 
 <p align="center">
  <img src="../PaddleOCR_log.png" align="middle" width = "600"/>

diff --git a/doc/doc_i18n/README_हिन्द.md b/doc/doc_i18n/README_हिन्द.md
@@ -1,4 +1,4 @@
-[English](../../README.md) | [简体中文](../../README_ch.md) | हिन्दी | [日本語](./README_日本語.md) | [한국인](./README_한국어.md) | [Pу́сский язы́к](./README_Ру́сский_язы́к.md)
+[English](../../README_en.md) | [简体中文](../../README.md) | हिन्दी | [日本語](./README_日本語.md) | [한국인](./README_한국어.md) | [Pу́сский язы́к](./README_Ру́сский_язы́к.md)
 
 <p align="center">
  <img src="../PaddleOCR_log.png" align="middle" width = "600"/>

diff --git a/doc/doc_i18n/README_日本語.md b/doc/doc_i18n/README_日本語.md
@@ -1,4 +1,4 @@
-[English](../../README.md) | [简体中文](../../README_ch.md) | [हिन्दी](./README_हिन्द.md) | 日本語 | [한국인](./README_한국어.md) | [Pу́сский язы́к](./README_Ру́сский_язы́к.md)
+[English](../../README_en.md) | [简体中文](../../README.md) | [हिन्दी](./README_हिन्द.md) | 日本語 | [한국인](./README_한국어.md) | [Pу́сский язы́к](./README_Ру́сский_язы́к.md)
 
 <p align="center">
 <img src="../PaddleOCR_log.png" align="middle" width = "600"/>

diff --git a/doc/doc_i18n/README_한국어.md b/doc/doc_i18n/README_한국어.md
@@ -1,4 +1,4 @@
-[English](../../README.md) | [简体中文](../../README_ch.md) | [हिन्दी](./README_हिन्द.md) | [日本語](./README_日本語.md) | 한국인 | [Pу́сский язы́к](./README_Ру́сский_язы́к.md)
+[English](../../README_en.md) | [简体中文](../../README.md) | [हिन्दी](./README_हिन्द.md) | [日本語](./README_日本語.md) | 한국인 | [Pу́сский язы́к](./README_Ру́сский_язы́к.md)
 
 <p align="center">
  <img src="../PaddleOCR_log.png" align="middle" width = "600"/>

diff --git a/ppocr/data/imaug/label_ops.py b/ppocr/data/imaug/label_ops.py
@@ -841,11 +841,11 @@ def __call__(self, data):
         return data
 
     def xyxyxyxy2xywh(self, boxes):
-        new_bboxes = np.zeros([len(bboxes), 4])
-        new_bboxes[:, 0] = bboxes[:, 0::2].min()  # x1
-        new_bboxes[:, 1] = bboxes[:, 1::2].min()  # y1
-        new_bboxes[:, 2] = bboxes[:, 0::2].max() - new_bboxes[:, 0]  # w
-        new_bboxes[:, 3] = bboxes[:, 1::2].max() - new_bboxes[:, 1]  # h
+        new_bboxes = np.zeros([len(boxes), 4])
+        new_bboxes[:, 0] = boxes[:, 0::2].min()  # x1
+        new_bboxes[:, 1] = boxes[:, 1::2].min()  # y1
+        new_bboxes[:, 2] = boxes[:, 0::2].max() - new_bboxes[:, 0]  # w
+        new_bboxes[:, 3] = boxes[:, 1::2].max() - new_bboxes[:, 1]  # h
         return new_bboxes
 
     def xyxy2xywh(self, bboxes):

diff --git a/ppocr/losses/distillation_loss.py b/ppocr/losses/distillation_loss.py
@@ -1184,7 +1184,9 @@ def forward(self, predicts, batch):
             loss = super().forward(out1, out2, ctc_label)
             if isinstance(loss, dict):
                 for key in loss:
-                    loss_dict["{}_{}_{}".format(self.name, model_name, idx)] = loss[key]
+                    loss_dict[
+                        "{}_{}_{}".format(self.name, self.model_name_pairs, idx)
+                    ] = loss[key]
             else:
                 loss_dict["{}_{}".format(self.name, idx)] = loss
         return loss_dict
diff --git a/ppocr/metrics/vqa_token_re_metric.py b/ppocr/metrics/vqa_token_re_metric.py
@@ -19,7 +19,7 @@
 import numpy as np
 import paddle
 
-__all__ = ["KIEMetric"]
+__all__ = ["VQAReTokenMetric"]
 
 
 class VQAReTokenMetric(object):

diff --git a/ppocr/metrics/vqa_token_ser_metric.py b/ppocr/metrics/vqa_token_ser_metric.py
@@ -19,7 +19,7 @@
 import numpy as np
 import paddle
 
-__all__ = ["KIEMetric"]
+__all__ = ["VQASerTokenMetric"]
 
 
 class VQASerTokenMetric(object):

diff --git a/ppocr/modeling/backbones/rec_efficientb3_pren.py b/ppocr/modeling/backbones/rec_efficientb3_pren.py
@@ -27,7 +27,7 @@
 import paddle.nn as nn
 import paddle.nn.functional as F
 
-__all__ = ["EfficientNetb3"]
+__all__ = ["EfficientNetb3_PREN"]
 
 GlobalParams = collections.namedtuple(
     "GlobalParams",

diff --git a/ppocr/modeling/heads/rec_aster_head.py b/ppocr/modeling/heads/rec_aster_head.py
@@ -132,7 +132,7 @@ def sample(self, x):
         # Decoder
         state = paddle.zeros([1, batch_size, self.sDim])
 
-        predicted_ids, predicted_scores = [], []
+        predicted_ids, predicted_scores, predicted = [], [], None
         for i in range(self.max_len_labels):
             if i == 0:
                 y_prev = paddle.full(shape=[batch_size], fill_value=self.num_classes)

diff --git a/ppocr/utils/dict/bengali_dict.txt b/ppocr/utils/dict/bengali_dict.txt
@@ -0,0 +1,74 @@
+হ
+থ
+শ
+৫
+ক
+ও
+য
+০
+গ
+দ
+ড়
+খ
+য়
+ঋ
+ন
+অ
+৪
+এ
+ব
+ঠ
+ঢ
+৭
+৯
+ধ
+ঙ
+ট
+ঝ
+ৎ
+ণ
+ত
+র
+২
+চ
+ঌ
+ড
+৬
+ঔ
+প
+ভ
+ম
+ঢ়
+ঈ
+৮
+ঘ
+১
+ষ
+৩
+ফ
+ছ
+ল
+জ
+আ
+।
+ঊ
+ই
+স
+ঐ
+উ
+ঞ
+া
+্
+ু
+ী
+ে
+ং
+ি
+়
+ঁ
+ৃ
+ো
+ূ
+ৈ
+ৌ
+ঃ