Skip to content

ahmedsbytes/Grepeto

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 

Repository files navigation

What is this ?

I was just playing , trying to make scrap all tech technical blogs and websites I find

Basically , I do not do anything special , just subtracting three classes for different data source type , then each website is extending it to just define its specific markup selectors

Will be any more changes ?

Nope , but If you find it useful and wated to make use of it, open PR and I will merge it or tell me and I will give access over the repo

#To Install ENV pip install scrapy pymongo slugify HTMLParser rdflib tagger dateparser python-dateutil sumy

to run spiers

scrapy crawl arstechnica

Don't forget debugging levels

scrapy crawl arstechnica -L INFO

List Spiders

scrapy list

Have fun :)

About

just scrapping some websites

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages