-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scraping Fails: class names has been changed #4
Comments
Thank you very much for the thorough list of changes, could not figure out at first why the original script returns empty data. |
Most recent ones as of 12/05/2020: BUSINESS_CARD = "internal___1jK0Z wrapper___26yB4"
|
The target website has changed its CSS files and class names and specific routing between pages have been changed too. I will list the changes from the start of the scraping script to the end:
{'class': 'category-object'} --> {'class': 'subCategory___BRUDy'}
name = category.find('h3', {'class': 'sub-category__header'}).text --> name = category.get('id')
{'class': 'sub-category-list'} --> {'class': 'subCategoryList___r67Qj'}
{'class': 'child-category'} --> {'class': 'subCategoryItem___3ksKz'}
sub_category_name = sub_category.find('a', {'class': 'sub-category-item'}).text --> sub_category_name = sub_category.find('a', {'class': 'navigation___2Efid'}).find('span').text
{'class': 'sub-category-item'} --> {'class': 'navigation___2Efid'}
'//a[@Class="category-business-card card"]' --> '//a[@Class="wrapper___2rOTx"]'
'//a[@Class="button button--primary next-page"]' --> '//a[@Class="paginationLinkNormalize___scOgG paginationLinkNext___1LQ14"]'
(By.CLASS_NAME, 'category-business-card card') --> (By.CLASS_NAME, 'wrapper___2rOTx')
next_url = base_url + data[category][sub_category] + "?numberofreviews=0&timeperiod=0&status=all" + f'&page={c}' --> next_url = base_url + data[category][sub_category] + "?numberofreviews=0&"+ f'&page={c}'+"&status=all&timeperiod=0"
(By.CLASS_NAME, 'category-business-card card') --> (By.CLASS_NAME, 'wrapper___2rOTx')
Also, tqdm_notebook throws as Attribute Error that it does not have 'sp' attribute. It's totally understandable since the notebook project it's just experimental. just replace tqdm_notebook with tqdm and it works!
The text was updated successfully, but these errors were encountered: