Web Scraping Project: Content Extraction

Library and Framework

pandas: Used for data storage and manipulation.
BeautifulSoup: Used for interpreting the HTML document.
requests: Used to communicate with the web page.
sqlite3: Creating the database instance.

Overview

This project focuses on web scraping using the requests and BeautifulSoup libraries to extract the contents of a web page. The primary objective is to analyze the HTML code of a webpage, identify relevant information, and extract it for further use.

Features

1. Web Page Content Extraction

Utilize the requests library to fetch the HTML content of a specified web page.

2. HTML Analysis

Use BeautifulSoup to parse and analyze the HTML code of the fetched web page.

3. Information Extraction

Identify and extract relevant information from the HTML code based on specified criteria.

4. Data Formatting

Save the extracted information in the required format - CSV.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Movies.db		Movies.db
README.md		README.md
top_25.py		top_25.py
top_25_films.csv		top_25_films.csv
top_50_films.csv		top_50_films.csv
webscraping_movies.py		webscraping_movies.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping Project: Content Extraction

Library and Framework

Overview

Features

1. Web Page Content Extraction

2. HTML Analysis

3. Information Extraction

4. Data Formatting

About

Releases

Packages

Languages

MaiMaiOkinawa/webscraping_project

Folders and files

Latest commit

History

Repository files navigation

Web Scraping Project: Content Extraction

Library and Framework

Overview

Features

1. Web Page Content Extraction

2. HTML Analysis

3. Information Extraction

4. Data Formatting

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages