-
After creating a python project, install Scrapy plugin.
-
When Scrapy gets installed, run the following commands
scrapy startproject vrbodata
cd vrbodata
scrapy genspider vrbo_spider vrbo.com
-
Inside spiders folder, a spider file created that will do the crwaling tasks.
-
start_urls variable name should remain same but change the url "https://www.vrbo.com/vacation-rentals/beach/usa/florida"
-
As per requirement we need below
items
for each property.- Property name,
- Property Details (Bedroom,Bathroom)
- Property Price,
- Property Image,
-
Here The property data is loaded dynamically through different request. Also, the data is inside Javascript <script> tag as JSON response.
Splash
Playwright
BeautifulSoup4
are the different ways to parse the JSON response from <script> Javascript
run this command in terminal scrapy crawl vrbos
- scrapy shell "https://www.vrbo.com/vacation-rentals/beach/usa/florida"
- view(response) #View Response in a browser
- If the desired data is hardcoded in JavaScript, you first need to get the JavaScript code:
response.text
items.py
is the class that handles the items that can be stored in databases- The items class should be initialized inside spider class
- Make sure the
ITEM_PIPELINE
is uncommented insettings.py
- Also make sure Mysql workbench is login and mysql server is open
- run
mysql -u root -p
to login to mysql server in local