Lazada Review Scraper

🚀 Scrape product reviews from Lazada Indonesia based on categories 🏄‍

Features ✨

Based on Amazon Cell Phones Reviews dataset project
Scrape multiple categories and saves into one and separate files
Scrapes basic metadata with ratings and reviews
Use multiple Puppeteer pages as workers
Configurable timeout for rate limits cooldowns (read more below)

Important Note 👀

Due to Lazada servers limits unusual requests, this scraper only utilize one worker to scrape search results, while the review scraping process is set to five workers with a five second timeout.

More detailed documentation on this issue coming soon...

Download Data 📫

You can download pre-scraped datasets at Kaggle (Lazada Indonesian Reviews).

Manual Scrape 🔧

Requirements 📃

Node.js
Yarn (optional)

Packages Used 📦

puppeteer for browser-based scraping
prettier for formatting source codes
ts-node for running TypeScript scripts

Steps 👨‍🔬

Preparation

Make sure the dependencies are downloaded by running npm install or yarn.
Copy config.default.ts (this file is ignored with git) to config.ts and customize config variables on config.ts.

Using Visual Studio Code

Open the project directory in Visual Studio Code.
Select and execute Scrape Search Results in the launch options on the Debug tab (exported to ./data/yyyymmdd-category-items.csv and ./data/yyyymmdd-items.csv).
Then select and execute Scrape Item Reviews (exported to ./data/yyyymmdd-category-reviews.csv and ./data/yyyymmdd-reviews.csv).

Using Command Line

Run npm run scrape:items or yarn scrape:items first to scrape initial item results (exported to ./data/yyyymmdd-category-items.csv and ./data/yyyymmdd-items.csv).
Then run npm run scrape:reviews or yarn scrape:reviews to scrape item reviews (exported to ./data/yyyymmdd-category-reviews.csv and ./data/yyyymmdd-reviews.csv).

Available Scripts 📝

scrape:items

Scrapes and saves entry results for review scraping.
scrape:reviews

Scrapes and saves entry reviews based on scrape:items data.
format

Format all .ts files.
format:data

Format .json files in /data.

License 👮‍♂️

CC0 1.0 Universal

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.vscode		.vscode
data		data
src		src
.gitignore		.gitignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
config.default.ts		config.default.ts
package.json		package.json
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lazada Review Scraper

Features ✨

Important Note 👀

Download Data 📫

Manual Scrape 🔧

Requirements 📃

Packages Used 📦

Steps 👨‍🔬

Preparation

Using Visual Studio Code

Using Command Line

Available Scripts 📝

License 👮‍♂️

About

Languages

License

grikomsn/lazada-review-scraper

Folders and files

Latest commit

History

Repository files navigation

Lazada Review Scraper

Features ✨

Important Note 👀

Download Data 📫

Manual Scrape 🔧

Requirements 📃

Packages Used 📦

Steps 👨‍🔬

Preparation

Using Visual Studio Code

Using Command Line

Available Scripts 📝

License 👮‍♂️

About

Topics

Resources

License

Stars

Watchers

Forks

Languages