tl;dr note: if you just care about data scraped from the target page, jump to the Scraped results section
This repo contains a working mirror of the Biden-Harris Transition Agency Review Teams announcement page – and the wget script and other code to reproduce that mirror.
- Mirror: https://wgetsnaps.github.io/biden-harris-transition-teams/
- Original: https://buildbackbetter.com/the-transition/agency-review-teams/
- Wayback: http://web.archive.org/save/https://buildbackbetter.com/the-transition/agency-review-teams/
- Last updated: 2020-11-10
See wgetsnap.sh to check out the wget
code for mirroring the page.
Caveat: This repo's scraped data is provided as-is, with absolutely no assurances or promises about its accuracy or integrity, so use it at your own risk! Of course, feel free to inspect and re-run the scraper script yourself.
Because this mirrored page and its HTML tables have newsworthy information, I've added a scraper script – scrape/scraper.py – which parses and extracts the tabular data from docs/index.html and outputs it as CSV.
The scraped results can be found at: scrape/data.csv
Or, if you'd like to see an interactive preview of the data on Google Sheets, click here
- @Transition46 tweet: https://twitter.com/Transition46/status/1326257434080522241
- @alexkotch critiquing the Biden-Harris transition teams: https://twitter.com/alexkotch/status/1326266162330669056
- Wayback snapshot of Donald Trump's agency landing teams page: https://web.archive.org/web/20161217040522/https://greatagain.gov/agency-landing-teams-54916f71f462
As annoying as it is to have to scrape the Biden transition page to get data, it's a big improvement from the previous administration's agency teams page, which was basically a Medium blog post:
If you've cloned this repo and want to recreate the wget mirror yourself, check out the Makefile.
Basically:
-
make snap
to execute script(s) for creating a mirror of the target site mirroring the target site (if ./docs doesn't already exist) -
make serve
to view the locally mirrored site -
make clean
to clean out an existing mirror (wget.log and ./docs/)