Skip to content

wgetsnaps/biden-harris-transition-teams

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Archive and scrape of Biden-Harris Transition Agency Review Team Members

tl;dr note: if you just care about data scraped from the target page, jump to the Scraped results section

This repo contains a working mirror of the Biden-Harris Transition Agency Review Teams announcement page – and the wget script and other code to reproduce that mirror.

page-screenshot.png

Code and data

See wgetsnap.sh to check out the wget code for mirroring the page.

Scraped results

Caveat: This repo's scraped data is provided as-is, with absolutely no assurances or promises about its accuracy or integrity, so use it at your own risk! Of course, feel free to inspect and re-run the scraper script yourself.

Because this mirrored page and its HTML tables have newsworthy information, I've added a scraper script – scrape/scraper.py – which parses and extracts the tabular data from docs/index.html and outputs it as CSV.

The scraped results can be found at: scrape/data.csv

Or, if you'd like to see an interactive preview of the data on Google Sheets, click here

preview-sheet.png

Related links

As annoying as it is to have to scrape the Biden transition page to get data, it's a big improvement from the previous administration's agency teams page, which was basically a Medium blog post:

2016-page-screenshot.png

Developer notes

If you've cloned this repo and want to recreate the wget mirror yourself, check out the Makefile.

Basically:

  • make snap to execute script(s) for creating a mirror of the target site mirroring the target site (if ./docs doesn't already exist)

  • make serve to view the locally mirrored site

  • make clean to clean out an existing mirror (wget.log and ./docs/)

Releases

No releases published

Packages

No packages published