Crawler-Selenium-Screenshot

The script crawls through urls (max 50) of a page
Based on the internal links, it crawls further
The code then filters and finds the immediate URLS (assuming the usual navigation bar nature)
Create a folder dynamically with a unique name
Grab a screenshot (full page) of all the immediate web pages (as PNG), name them dynamically and store them in a folder
Save the webpage locally
Save the pdf files locally

Output process

Folder containing Screenshots

You can run the python file as it is (with required imports). I think it is interactive to run the python notebook.

Constraint: chromedriver's location needs to be specified in the PATH. Download chromedriver based on your OS. (https://chromedriver.chromium.org/)

Folder naming scheme : URL_Screenshots

File naming scheme: URL-page_slug

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Associates		Associates
Bachelors		Bachelors
Doctoral		Doctoral
Masters		Masters
Output		Output
Video		Video
umd.edu_Screenshots		umd.edu_Screenshots
.gitattributes		.gitattributes
COVID-Selnium-Crawler-Screenshot.ipynb		COVID-Selnium-Crawler-Screenshot.ipynb
Iterative-Crawler-BATCH-Copy1.ipynb		Iterative-Crawler-BATCH-Copy1.ipynb
Iterative-Crawler-BATCH-Copy2.ipynb		Iterative-Crawler-BATCH-Copy2.ipynb
Iterative-Crawler-BATCH.ipynb		Iterative-Crawler-BATCH.ipynb
Iterative-crawler-batch-processing.ipynb		Iterative-crawler-batch-processing.ipynb
Iterative_Crawler-Copy1.ipynb		Iterative_Crawler-Copy1.ipynb
Iterative_Crawler-Copy2.ipynb		Iterative_Crawler-Copy2.ipynb
Iterative_Crawler.ipynb		Iterative_Crawler.ipynb
README.md		README.md
Screenshotter.ipynb		Screenshotter.ipynb
[Old]COVID-Selnium-Crawler-Screenshot.ipynb		[Old]COVID-Selnium-Crawler-Screenshot.ipynb
chromedriver.exe		chromedriver.exe
crawler.py		crawler.py
debug.log		debug.log
webpage.html		webpage.html

Provide feedback