Skip to content

Commit

Permalink
NDRC NEA Scraper V0.1
Browse files Browse the repository at this point in the history
  • Loading branch information
recherchetts committed Apr 30, 2019
1 parent 31a4fe0 commit 2f8abcd
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ Scrapy
### 细节

1. 增量爬取(已经爬取的不重复爬取,利用DeltaFetch库,使用Berkeley DB)
1. 使用百度AI平台对文件扫描件进行图像识别
1. 读取附件防阻塞,读取大型超过设定时间
1. 使用百度AI平台对文件扫描件进行图像识别(主要是国家能源局)
1. 读取附件防阻塞,读取大型PDF超过设定时间自动放弃

# Elasticsearch搜索引擎

Expand Down

0 comments on commit 2f8abcd

Please sign in to comment.