Crawling rss or site, and inserting mongodb or writing file or pushing redis.
If your site does not have rss, you should write xpath
$ # Just run init.sh file
$ bash init.sh # Default, Crawler run every 60 second
$ bash init.sh 10 # Crawler run every 10 second
{
"Site": "Yildiz Teknik University",
"SiteLink": "https://ytuce.maliayas.com/",
"SiteRssLink": "https://ytuce.maliayas.com/?type=rss",
"ListXpath": "//div[@class='text_title']",
"UrlXpath": "a/@href",
"TitleXpath": "a/text()"
}
University | Crawling Site | Status |
---|---|---|
Yildiz Technical | https://ytuce.maliayas.com/?type=rss | Ok |
Istanbul | http://ce.istanbul.edu.tr/ | Nope |
Pamukkale | http://www.pamukkale.edu.tr/bilgisayar | WIP |
Istanbul Technical | http://www.bb.itu.edu.tr/ | Nope |
Anadolu | https://anadolu.edu.tr/duyurular | Nope |
Reddit Python | https://www.reddit.com/r/Python/.rss | Ok |