Skip to content

Latest commit

 

History

History
215 lines (162 loc) · 89.2 KB

extract_text_ocr.md

File metadata and controls

215 lines (162 loc) · 89.2 KB

?> One of the advanced features: extracting the desired text from the image, supporting both online API and local offline modes. For local offline mode: support any CPU model and only GPU of NVIDIA graphics card to parse; and the image text can be parsed both horizontally and vertically.

Introduction

SunnyCapturer provides a built-in account by default, but does not guarantee its stability and does not make any commitment or responsibility.

Online engine:

  • Built-in account

    • Pros: Free of charge, quota is reset on the 1st of every month, no setup required for users

    • Disadvantages: Shared by all users, occasional concurrency limit, quota used up before the end of the month.

  • Private account Pros.

    • Pros: Private account only, register on your own platform, fill in the key in the settings screen.
    • Disadvantages: paid, expensive, when the base quota is used up

Native engine: ** For Window platforms, a corresponding engine is available.

  • For Window platform, we provide the installation package of offline OCR recognition for the corresponding version, which can be downloaded and used immediately; or download this extension package and import it into the root directory of the program, which can be used after restarting the software.

Currently, the following cloud API platforms and one offline local engine are supported:

Register for Tencent Cloud | Tencent Cloud

『Picture Translation』 has a monthly quota of 1 w times, reset at the beginning of the month; at the same time, the number of characters for 『Text Translation』 is 500w free packages per month. Free package QPS limit is 10 for both individual and enterprise certified users after the paid service is activated.

Official website registration: https://cloud.tencent.com

Key Generation: https://console.cloud.tencent.com/cam

Registration Diagram:

Completion of this procedure Key Location:



Register for Baidu Cloud | Baidu Cloud

“Text Recognition” quota is 3500 times per month, reset at the beginning of the month; QPS concurrency limit is 2. QPS concurrency limit is 2 where 3000 = 1000 (standard version) + 1000 (standard with location version) + 1000 (high precision version) + 500 (high precision with location version)

Registration on official website: https://cloud.baidu.com

Image of registration:

Completion of this procedure Key Location:



Image recognition engine using native offline

Extract text from images offline with a local offline engine without any limitations on the number of times; the speed depends on your CPU / NVIDIA graphics card model. Speed depends on your CPU / NVIDIA graphics card model. Supports text extraction in common mainstream languages, built-in simplified Chinese packages by default, and supports vertical text reading.

Advantages: Runs offline locally, privacy and security; performance is positively correlated with CPU/GPU model, supports importing and recognizing images and folders in batch.

  • CPU version: Strong generalization, less memory consumption, very fast parsing for single image, but time-consuming for batch images (recommended for general users).
  • GPU version: only supports NVIDIA card, occupies more memory, but for batch parsing of large number of images, the time consumed is 1/2 ~ 1/3 of the CPU version, very fast (recommended for advanced N card)
  • Only support Windows 64-bit system; both support single image and batch image recognition and parsing, directly drag and drop to the window.
  • Theoretically also supports Linux, and will be adapted and built for Linux in the future.

Cloud API-OCR Text Extraction Return Error Code and Meaning

OCR extracts text error codes and their meanings
Tencent Cloud (online)Baidu Cloud (online)Local engine (offline)
official documentofficial document    Official Error Code Documentation
Error codeClarificationError code Error messageDescriptionError code Clarification
FailedOperation.ErrorUserArea用户区域与请求服务区域不一致。1 Unknown error未知错误,请再次请求,如果持续出现此类错误,请在控制台提交工单联系技术支持团队暂略
FailedOperation.InsertErr数据插入错误。2 Service temporarily unavailable服务暂不可用,请再次请求,如果持续出现此类错误,请在控制台提交工单联系技术支持团队
FailedOperation.LanguageRecognitionErr暂时无法识别该语种。3 Unsupported openapi method调用的API不存在,请检查请求URL后重新尝试,一般为URL中有非英文字符,如"-",可手动输入重试
FailedOperation.NoFreeAmount本月免费额度已用完,如需继续使用您可以在机器翻译控制台升级为付费使用。4 Open api request limit reached集群超限额,请再次请求,如果持续出现此类错误,请在控制台提交工单联系技术支持团队
FailedOperation.RequestAiLabErr内部请求错误。6 No permission to access data无接口调用权限,创建应用时未勾选相关文字识别接口,请登录百度云控制台,找到对应的应用,编辑应用,勾选上相关接口后重新调用,也可使用权限额度诊断工具完成自助诊断
FailedOperation.ServiceIsolate账号因为欠费停止服务,请在腾讯云账户充值。14 IAM Certification failedIAM鉴权失败,建议用户参照文档自查生成sign的方式是否正确,或换用控制台中ak   sk的方式调用
FailedOperation.StopUsing账号已停服。17 Open api daily request limit reached免费测试资源使用完毕,每天请求量超限额,已支持计费的接口,您可以在控制台文字识别服务选择购买相关接口的次数包或开通按量后付费;邀测和未支持计费的接口,您可以在控制台提交工单申请提升限额,也可先使用权限额度诊断工具完成自助诊断
FailedOperation.SubmissionLimitReached当日提交任务数达到上限18 Open api qps request limit reachedQPS超限额,免费额度并发限制为2QPS,开通按量后付费或购买次数包后并发限制为10QPS,如您需要更多的并发量,可以选择购买QPS叠加包;邀测和未支持计费的接口,您可以在控制台提交工单申请提升限额,也可先使用权限额度诊断工具完成自助诊断
FailedOperation.TooManyWaitProcess过多未完成任务19 Open api total request limit reached请求总量超限额,已支持计费的接口,您可以在控制台文字识别服务选择购买相关接口的次数包或开通按量后付费;邀测和未支持计费的接口,您可以在控制台提交工单申请提升限额
FailedOperation.UserHasNoFreeAmount本月免费额度已用完,如需继续使用您可以在机器翻译控制台购买资源包或开通后付费使用。100 Invalid parameter无效的access_token参数,token拉取失败,您可以参考“Access   Token获取”文档重新获取
FailedOperation.UserNotRegistered服务未开通,请在腾讯云官网机器翻译控制台开通服务。110 Access token invalid or no longer validaccess_token无效,token有效期为30天,请注意需要定期更换,也可以每次请求都拉取新token
InternalError.BackendTimeout后台服务超时,请稍后重试。111 Access token expiredaccess   token过期,token有效期为30天,请注意需要定期更换,也可以每次请求都拉取新token
InternalError.ErrorGetRoute路由获取错误。216100 invalid param请求中包含非法参数,请检查后重新尝试
InternalError.ErrorUnknown未知错误。216101 not enough param缺少必须的参数,请检查参数是否有遗漏
InternalError.RequestFailed请求失败。216102 service not support请求了不支持的服务,请检查调用的url
InvalidParameter.DuplicatedSessionIdAndSeq重复的SessionUuid和Seq组合。216103 param too long请求中某些参数过长,请检查后重新尝试
InvalidParameter.MissingParameter参数错误。216110 appid not existappid不存在,请重新核对信息是否为后台应用列表中的appid
InvalidParameter.SeqIntervalTooLargeSeq之间的间隙请不要大于2000。216200 empty image图片为空,请检查后重新尝试
LimitExceeded.LimitedAccessFrequency超出请求频率。216201 image format error上传的图片格式错误,现阶段我们支持的图片格式为:PNG、JPG、JPEG、BMP,请进行转码或更换图片
UnauthorizedOperation.ActionNotFound请填写正确的Action字段名称。216202 image size error上传的图片大小错误,请根据调用服务的接口文档,查看请求参数image要求,重新上传图片
UnsupportedOperation.AudioDurationExceed音频分片长度超过限制,请保证分片长度小于8s。216205 input oversize传入的请求体大小错误,现阶段我们支持的请求体最大上限为:base64编码后小于10M,请重新发送请求
UnsupportedOperation.TextTooLong单次请求text超过长度限制。216306 Upload file error上传文件失败,请检查提交请求接口的请求参数
UnsupportedOperation.UnSupportedTargetLanguage不支持的目标语言,请参照语言列表。216308 Pdf_file_num exceeds the number of pdf pages参数pdf_file_num大于PDF文件实际页数
UnsupportedOperation.UnsupportedLanguage不支持的语言,请参照语言列表。216401 Create task failed提交请求失败
UnsupportedOperation.UnsupportedSourceLanguage不支持的源语言,请参照语言列表。216402 Query task failed获取结果失败
216603 Check pdf page num failed获取PDF文件页数失败,请检查PDF文件以及base64编码
216604 Insufficient available quota请求总量超限额,您可以购买或申请更多限额
216630 recognize error识别错误,请再次请求,请确保图片中包含对应卡证票据
216631 recognize bank card error识别银行卡错误,出现此问题的原因一般为:您上传的图片非银行卡正面,上传了异形卡的图片、上传的银行卡正面图片不完整或模糊
216633 recognize idcard error识别身份证错误,出现此问题的原因一般为:您上传了非身份证图片、上传的身份证图片不完整或模糊
216634 detect error检测错误,请再次请求,如果持续出现此类错误,请在控制台提交工单联系技术支持团队
216600 business verify failed企业核验相关服务请求失败,请再次请求,仅适用于企业核验相关服务:企业工商信息查询(标准版/高级版)、企业二/三/四要素核验。如果持续出现此类错误,请在控制台提交工单联系技术支持团队
216601 business verify result empty企业核验相关服务查询成功,但是无查询结果返回,请再次请求,仅适用于企业核验相关服务:企业工商信息查询(标准版/高级版)、企业二/三/四要素核验。如果持续出现此类错误,请在控制台提交工单联系技术支持团队
216602 business verify timeout企业核验相关服务接口超时,请再次请求,仅适用于企业核验相关服务:企业工商信息查询(标准版/高级版)、企业二/三/四要素核验。如果持续出现此类错误,请在控制台提交工单联系技术支持团队
282000 internal error服务器内部错误,如果您使用的是高精度接口,报这个错误码的原因可能是您上传的图片中文字过多,识别超时导致的,建议您对图片进行切割后再识别,其他情况请再次请求,   如果持续出现此类错误,请在控制台提交工单联系技术支持团队
282003 missing parameters: {参数名}请求参数缺失
282005 batch processing error处理批量任务时发生部分或全部错误,请根据具体错误码排查
282006 batch task limit reached批量任务处理数量超出限制,请将任务数量减少到10或10以下
282100 image transcode error图片压缩转码错误
282102 target detect error未检测到图片中识别目标,请确保图片中包含对应卡证票据,出现此问题的原因一般为:您上传了非卡证图片、图片不完整或模糊
282103 recognize error, failed to match the template图片目标识别错误,请确保图片中包含对应卡证票据,出现此问题的原因一般为:您上传了非卡证图片、图片不完整或模糊
282110 urls not exitURL参数不存在,请核对URL后再次提交
282111 url format illegalURL格式非法,请检查url格式是否符合相应接口的入参要求
282112 url download timeouturl下载超时,请检查url对应的图床/图片无法下载或链路状况不好,或图片大小大于3M,或图片存在防盗链,您可以重新尝试以下,如果多次尝试后仍不行,建议更换图片地址
282113 url response invalidURL返回无效参数
282114 url size errorURL长度超过1024字节或为0
282134 officialWeb service exception仅适用于增值税发票验真接口,国税局端网络超时(一般因地方税务局升级或系统调整造成,建议您第2日重试,如果持续出现此类错误,请在控制台提交工单联系技术支持团队)
282808 request id: xxxxx not existrequest   id xxxxx 不存在
282809 result type error返回结果请求错误(不属于excel或json)
282810 image recognize error图像识别错误,请再次请求,如果持续出现此类错误,请在控制台提交工单联系技术支持团队
282160 driving license backend resource overrun后端资源超限,仅适用于行驶证核验接口,请在控制台提交工单联系技术支持团队
282161 driving license requests too frequently请求过于频繁,仅适用于行驶证核验接口

API Privacy Agreements by Platform