This repository contains two parts: Web Scraping and Image Scraping.
For scraping static webpages, run the static_webpage_scraping.ipynb file.
URL: Wikipedia Page
Dependencies: requests, pandas, bs4, lxml
For scraping interactive webpages, run the interactive_webpage_scraping.ipynb file.
URL:
Dependencies: selenium, splinter, webdriver_manager, pandas, bs4
For scraping images, packages used: BeautifulSoup and Scrapy.
URL : Website
Dependencies: beautifulsoup4, scrapy, Pillow, shutil
For scraping images through BeautifulSoup run the main.py file.
The scraped images are stored under the folder Image_Scraping -> images -> image_soup
For scraping images through Scrapy:
- Inside the project folder, open Terminal.
- Type scrapy startproject scrape_image (this will create a folder scrape_image).
- Under the folder scrape_image -> scrape_image -> spiders, add main_spider.py file.
- Open Terminal change the directory to scrape_image.
- Type scrapy crawl my_spider (this will create a folder image_scrapy -> full, which will contain all the scraped images).