Skip to content

Werbscrapper for german news site Deutsche Welle

Notifications You must be signed in to change notification settings

acmbo/datascrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DW scraper Repo

Websitescrape to gather data from German news websitese dw.com/dw. The project aims to create a database out of:

  • articles
  • keywords
  • article meta: author, post date, recomendations

Project structure

[Follows..]

Setup Databases

The Scrapper can utilize two kinds of databases: -mongodb -redis Both are NoSQL Databases, which can operate on different system and can be used to store and query documents. Because of the ease of use of NoSQL and the possibilitys of flexible choosing the of amount data to store behind keys in a NoSQL database, these both were choosen for this porject.

Several different hardware architectures were avialbe at the beginning of the project (RaspberryPi ARM32 and Normal Ubuntu Server 64bit), so different implementations were needed.

Setting up the database structure

Setting up the mongodb is straight forward.

  1. Follow the steps to install and start a localhost:27017 mongodb server Link
  2. Run mongo.py within the src/db folder

Setting up redis is a bit more tedious:

  1. Install redis according to the project website
  2. Install redis-py with pip Link. If you use conda follow Link

About

Werbscrapper for german news site Deutsche Welle

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages