A set of functions and classes to help web scraping and simple web audits
from pyscraper.sitemapper import Sitemapper
from pyscraper.scrapper import BatchScrape
sitemap = 'https://www.datascience.com/sitemap.xml'
page_urls = Sitemapper(sitemap)
completed_urls, broken_urls = BatchScrape(page_urls)
In addition you can now visualise the hierachical nature of the sitemap and produce a d3.js visualisation
# Visualise pages
from pyscraper.viz import VisualiseSitemap
VisualiseSitemap(page_urls)
The text from each page could then be visualised using this repository