Scrapes AWS documentation into html files
Sitemap index from https://docs.aws.amazon.com/sitemap_index.xml. Removed documentation for AWS SDKs, reducing # of documents from 600k+ to 100k+.
Sitemap index contains a list of xml sitemaps for AWS documentation topics
Reads the sitemap index and aggregates html files listed within each xml sitemap in main.xml. Outputs to html_urls.json for review.
Saves each html document in html_urls.json to "aws-docs" folder.