Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用Paddlex预测OCR结果速度非常的慢 #2674

Open
809388027 opened this issue Dec 17, 2024 · 8 comments
Open

使用Paddlex预测OCR结果速度非常的慢 #2674

809388027 opened this issue Dec 17, 2024 · 8 comments
Assignees

Comments

@809388027
Copy link

809388027 commented Dec 17, 2024

描述问题

我使用PP-OCRv4_mobile_rec_pretrained,CPU环境,预测过程非常的缓慢。我之前使用的是PaddleOCR库使用的是ch_ppocr_mobile_v2.0模型,相同图片预测速度两者相差近三倍是正常的吗?

复现

使用PaddleX的代码示例:

ocr_det_model = create_predictor(global_config["model"]["det"])
ocr_rec_model = create_predictor(global_config["model"]["rec"])

output = ocr_det_model.predict(paths, batch_size=8)
all_cropped_images = []

# 处理每张图片的检测结果
for idx, res in enumerate(output):
    img = cv2.imread(file_path)

    for range_idx in range(len(res.json["dt_polys"])):
        area = res.json["dt_polys"][range_idx]  # [点1, 点2, 点3, 点4]
        x_min = int(min(point[0] for point in area))
        x_max = int(max(point[0] for point in area))
        y_min = int(min(point[1] for point in area))
        y_max = int(max(point[1] for point in area))

        cropped_img = img[y_min:y_max, x_min:x_max]
        all_cropped_images.append(cropped_img)


rec_results = ocr_rec_model.predict(all_cropped_images, batch_size=8)
for i, rec_output in enumerate(rec_results):
    print(rec_output.json)

环境

系统: windows10
Python: 3.10
Paddle环境: PaddlePaddle(CPU)和PaddleX 均为 3.0-beta2

@cuicheng01
Copy link
Collaborator

为什么不直接用PaddleX的OCR产线呢?

@809388027
Copy link
Author

为什么不直接用PaddleX的OCR产线呢?

我这边在det识别完后后有一些数据上的处理, 所以我这边选择拆开做。使用产线效率会显著提高吗?我昨天有写过一个demo使用产线的貌似效率也是大差不差.

@cuicheng01
Copy link
Collaborator

产线的速度还是比较快的,麻烦提供下demo图像和产线的代码呢,并且产线可以使用高性能推理的方案做推理,高性能推理的方案又实用C++做前后处理加速

@wangwenqi567
Copy link

抱歉,问一个白痴问题,离线下载的模型放到对应的文件夹,但是代码还是会download_and_extract去下载模型,应该怎么做呢

@xlcaptain
Copy link

就是啊,为什么不提供直接读取本地模型文件的参数啊?

@cuicheng01
Copy link
Collaborator

其实现在是有这个参数的,只是共用了model_name的的名字,为了避免混淆,下一个版本会区分开来

@cuicheng01
Copy link
Collaborator

抱歉,问一个白痴问题,离线下载的模型放到对应的文件夹,但是代码还是会download_and_extract去下载模型,应该怎么做呢

你可以把你的准备预测的模型放到指定目录里,他就不会下载默认的模型了

@BigBroFinch
Copy link

用docker 部署 paddlex --serve --pipeline OCR --device cpu --port 4000 服务 解析结果也比较慢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants