scrapy_tianya

A crawler for bbs.tianya.cn, using scrapy as crawler framework

PS: Actually you can rewrite tianya/pipelines.py to change the storage backend, instead of mongodb :)

PPS: xpath links are easy to get in Chrome Developer Tool

cd path/to/repo
mkdir job
scrapy crawl tianyaSpider -s JOBDIR=/path/to/job/job-1_or_whatever

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
tianya		tianya
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
scrapy.cfg		scrapy.cfg

Provide feedback