crawl phone numbers and titles on www.zhaopin.com/ and www.51job.com/ for job fair sponsor
Scrapy + Selenium + PhantomJS + Redis
-
scrapy :
download scrapy
virtualenv ~/scrapy
source ~/scrapy/bin/activate
-
PhantomJS :
wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
cp phantomjs to system path
-
openpyxl (save data to Excel) :
sudo pip install openpyxl
-
Redis pip install redis
redis-cli -h localhost -p 6379 -a {password} --raw
scrapy crawl {spider_name}
or using release script :
cd release/ && python release.py
redis-cli -h localhost -p 6379 {-a password} --raw
redis rdb default path in Ubuntu:/var/lib/redis/dump.rdb
echo "HGETALL qiancheng_zhengzhou" | redis-cli -h localhost -p 6379 {-a password} --raw >> qiancheng_zhengzhou_all.txt