Skip to content

TorshaMajumder/web_and_image_scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Web and Image Scraping


This repository contains two parts: Web Scraping and Image Scraping.


Web Scraping:

For scraping static webpages, run the static_webpage_scraping.ipynb file.

URL: Wikipedia Page

Dependencies: requests, pandas, bs4, lxml

For scraping interactive webpages, run the interactive_webpage_scraping.ipynb file.

URL:

Dependencies: selenium, splinter, webdriver_manager, pandas, bs4


Image Scraping:

For scraping images, packages used: BeautifulSoup and Scrapy.

URL : Website

Dependencies: beautifulsoup4, scrapy, Pillow, shutil

For scraping images through BeautifulSoup run the main.py file.

The scraped images are stored under the folder Image_Scraping -> images -> image_soup

For scraping images through Scrapy:

  • Inside the project folder, open Terminal.
  • Type scrapy startproject scrape_image (this will create a folder scrape_image).
  • Under the folder scrape_image -> scrape_image -> spiders, add main_spider.py file.
  • Open Terminal change the directory to scrape_image.
  • Type scrapy crawl my_spider (this will create a folder image_scrapy -> full, which will contain all the scraped images).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published