Skip to content

Latest commit

 

History

History
30 lines (17 loc) · 1007 Bytes

README.md

File metadata and controls

30 lines (17 loc) · 1007 Bytes

Web Scraping Project: Content Extraction

Library and Framework

  • pandas: Used for data storage and manipulation.
  • BeautifulSoup: Used for interpreting the HTML document.
  • requests: Used to communicate with the web page.
  • sqlite3: Creating the database instance.

Overview

This project focuses on web scraping using the requests and BeautifulSoup libraries to extract the contents of a web page. The primary objective is to analyze the HTML code of a webpage, identify relevant information, and extract it for further use.

Features

1. Web Page Content Extraction

  • Utilize the requests library to fetch the HTML content of a specified web page.

2. HTML Analysis

  • Use BeautifulSoup to parse and analyze the HTML code of the fetched web page.

3. Information Extraction

  • Identify and extract relevant information from the HTML code based on specified criteria.

4. Data Formatting

  • Save the extracted information in the required format - CSV.