-
Notifications
You must be signed in to change notification settings - Fork 14
Home
Qasim Iqbal edited this page Apr 16, 2016
·
9 revisions
Here's the started code used for starting up on writing a new scraper:
from ..utils import Scraper
from bs4 import BeautifulSoup
from collections import OrderedDict
from datetime import datetime, date
import json
import requests
import pytz
class ScraperName:
"""A scraper for <scraper description>."""
host = '<scraper website>'
@staticmethod
def scrape(location='.'):
Scraper.logger.info('ScraperName initialized.')
# do the magic here
Scraper.logger.info('ScraperName completed.')
Remember that output goes to JSON files in the given location parameter as the path. Also, the dictionary used to dump the JSON should be an OrderedDict
to preserve order.
I test the scrapers using a test.py
script like the following:
import uoftscrapers
import logging
import sys
# Set up logging so it prints to standard output
logger = logging.getLogger("uoftscrapers")
logger.setLevel(logging.INFO)
ch = logging.StreamHandler(sys.stdout)
ch.setLevel(logging.INFO)
logger.addHandler(ch)
# Run scraper
uoftscrapers.ScraperName.scrape(location='./data')
Placed at the root of the repo, this will import the uoftscrapers
from the source and not the one pip3
is managing.