Web scraping is a technique to automatically access and extract large amounts of information from a website, which can save a huge amount of time and effort.
In this project we are going to use amazon product (ps5) search result link as the source of our raw data link.
- Python 3.6+
- install BeautifulSoup
pip install beautifulsoup4
- install Requests
pip install requests
- install Pandas
pip install pandas
- the user-agent of your browser. To get the user-agent, just search for "my user agent" on Google and copy the user-agent string.
- product search url from amazon
- Importing the required libraries
- Specifying the URL containing the dataset and passing it to
requests.get()
to get the HTML content of the page. - Using BeautifulSoup to parse the HTML content
- Extracting the required information from the data
- Saving the pandas dataframe as a CSV file called
Amazon Data.csv
- It contains a python file
scraper.py
which contains the rough codes to be used in the final notebook file. - It contains a jupyter notebook file
Amazon Web Scraper.ipynb
which contains the final codes to be used in the project.- function to Extract Product Title
- function to Extract Product Price
- function to Extract Product Rating
- function to Extract Number of User Reviews
- function to Extract Product Availability
- It contains a csv file
Amazon Data.csv
which contains the final data extracted from the website.
I am Mohd Mohitur Rahaman, and currently, I am pursuing MCA from KIIT University, Bhubaneswar. Talking about my previous educational background, I did my BSC in Mathematics at Malda College.
If you have any feedback, you can make the changes and create a pull request or, please reach out to me at here or LinkedIn.