TS-Walmart-Scraper is a web scraping tool developed in TypeScript on top of the Crawlee library. It allows users to extract relevant data points from products on walmart.com. The scraper can be used with inputs such as category URLs, brand URLs, search keywords or specific product URLs.
Clone the repository
git clone https://github.com/<username>/TS-Walmart-Scraper.git
cd TS-Walmart-Scraper
Install dependencies
npm install
The input for the scraper is a JSON file named INPUT.json
, which should be located in the following directory: project_folder\storage\key_value_stores\default\
. The INPUT.json
file should contain the following fields:
productUrls
: An array of URLs for specific product pages to scrape.listingUrls
: An array of URLs for category pages or brand pages to scrape (that contains listing of products and pagination).keywords
: An array of search keywords to use when searching Walmart.com.maxPrice
: The maximum price for products to scrape.minPrice
: The minimum price for products to scrape.startPageNumber
: The page number to start scraping from.finalPageNumber
: The final page number to scrape.
Using 0
as value for minPrice
and maxPrice
indicates the scraper to collect products from all price ranges.
Using 0
as value for startPageNumber
and finalPageNumber
indicates the scraper to crawl all the page range.
To run the scraper, navigate to the project directory in your terminal and run the following command:
npm start
The output of the scraper will be a series of JSON files, one per product scraped, and will be located in the following directory: project_folder\storage\datasets\default
.
The output JSON files from TS-Walmart-Scraper includes all the following fields:
URL
: The URL of the product page.idCodes
: An object containing the unique identifier codes of the product, including theSKU
andUPC
.seller
: An object containing information about the seller and brand of the product, including thebrand
,brandURL
,seller
, andsellerURL
.title
: The title of the product.media
: An object containing URLs for images and videos of the product, including themain
image URL,gallery
array of image URLs, andvideos
array of video objects, each with atitle
andurl
field.pricing
: An object containing pricing information for the product, including thesalePrice
,fullPrice
, andcurrencySymbol
.isAvailable
: A boolean indicating whether the product is currently available.isGiftEligible
: A boolean indicating whether the product is eligible for gift-giving.isUsed
: A boolean indicating whether the product is used.rating
: An object containing rating information for the product and seller, including theitemRating
,itemReviews
,sellerRating
, andsellerReviews
.orderLimits
: An object containing minimum and maximum order limits for the product, including themin
andmax
fields.category
: An object containing information about the category of the product, including thefullPath
andpathParts
array of category objects, each with aname
andurl
field.info
: An object containing additional information about the product, including theshortDescription
,longDescription
, andspecifications
array of objects, each with anattribute
andvalue
field.variants
: An array containing information about different variants of the product, including objects with aisCurrentVariant
,url
,SKU
,isAvailable
,pricing
, andoptions
field. Theoptions
field contains an array of objects, each with anattribute
andvalue
field.
This project is licensed under the AGPL-3.0 license License. See the LICENSE file for details.
I hope you find this software useful and I would be honored if you fork this repository and collaborate with me to improve it. If you have any suggestions or find any bugs, please don't hesitate to open an issue or submit a pull request. Thanks for using TS-Walmart-Scraper!