Skip to content

Latest commit

 

History

History
21 lines (14 loc) · 720 Bytes

README.md

File metadata and controls

21 lines (14 loc) · 720 Bytes

What is this ?

I was just playing , trying to make scrap all tech technical blogs and websites I find

Basically , I do not do anything special , just subtracting three classes for different data source type , then each website is extending it to just define its specific markup selectors

Will be any more changes ?

Nope , but If you find it useful and wated to make use of it, open PR and I will merge it or tell me and I will give access over the repo

#To Install ENV pip install scrapy pymongo slugify HTMLParser rdflib tagger dateparser python-dateutil sumy

to run spiers

scrapy crawl arstechnica

Don't forget debugging levels

scrapy crawl arstechnica -L INFO

List Spiders

scrapy list

Have fun :)