Web and Image Scraping

This repository contains two parts: Web Scraping and Image Scraping.

For scraping static webpages, run the static_webpage_scraping.ipynb file.

URL: Wikipedia Page

Dependencies: requests, pandas, bs4, lxml

For scraping interactive webpages, run the interactive_webpage_scraping.ipynb file.

URL:

Dependencies: selenium, splinter, webdriver_manager, pandas, bs4

For scraping images, packages used: BeautifulSoup and Scrapy.

URL : Website

Dependencies: beautifulsoup4, scrapy, Pillow, shutil

For scraping images through BeautifulSoup run the main.py file.

The scraped images are stored under the folder Image_Scraping -> images -> image_soup

For scraping images through Scrapy:

Inside the project folder, open Terminal.
Type scrapy startproject scrape_image (this will create a folder scrape_image).
Under the folder scrape_image -> scrape_image -> spiders, add main_spider.py file.
Open Terminal change the directory to scrape_image.
Type scrapy crawl my_spider (this will create a folder image_scrapy -> full, which will contain all the scraped images).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Image_Scraping		Image_Scraping
webpage_scraping		webpage_scraping
.DS_Store		.DS_Store
README.md		README.md

Provide feedback