Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

Scrape product reviews from Lazada Indonesia based on categories

License

Notifications You must be signed in to change notification settings

grikomsn/lazada-review-scraper

Repository files navigation

Lazada Review Scraper

🚀 Scrape product reviews from Lazada Indonesia based on categories 🏄‍

cat scraping

Features ✨

  • Based on Amazon Cell Phones Reviews dataset project
  • Scrape multiple categories and saves into one and separate files
  • Scrapes basic metadata with ratings and reviews
  • Use multiple Puppeteer pages as workers
  • Configurable timeout for rate limits cooldowns (read more below)

Important Note 👀

Due to Lazada servers limits unusual requests, this scraper only utilize one worker to scrape search results, while the review scraping process is set to five workers with a five second timeout.

More detailed documentation on this issue coming soon...

Download Data 📫

You can download pre-scraped datasets at Kaggle (Lazada Indonesian Reviews).

Manual Scrape 🔧

Requirements 📃

Packages Used 📦

Steps 👨‍🔬

Preparation

  • Make sure the dependencies are downloaded by running npm install or yarn.
  • Copy config.default.ts (this file is ignored with git) to config.ts and customize config variables on config.ts.

Using Visual Studio Code

  • Open the project directory in Visual Studio Code.
  • Select and execute Scrape Search Results in the launch options on the Debug tab (exported to ./data/yyyymmdd-category-items.csv and ./data/yyyymmdd-items.csv).
  • Then select and execute Scrape Item Reviews (exported to ./data/yyyymmdd-category-reviews.csv and ./data/yyyymmdd-reviews.csv).

Using Command Line

  • Run npm run scrape:items or yarn scrape:items first to scrape initial item results (exported to ./data/yyyymmdd-category-items.csv and ./data/yyyymmdd-items.csv).
  • Then run npm run scrape:reviews or yarn scrape:reviews to scrape item reviews (exported to ./data/yyyymmdd-category-reviews.csv and ./data/yyyymmdd-reviews.csv).

Available Scripts 📝

  • scrape:items

    Scrapes and saves entry results for review scraping.

  • scrape:reviews

    Scrapes and saves entry reviews based on scrape:items data.

  • format

    Format all .ts files.

  • format:data

    Format .json files in /data.

License 👮‍♂️

CC0 1.0 Universal