Scraper with Selenium config behaves differently with or without headless option #65

retdop · 2019-02-20T17:24:34Z

Crawling/Testing not working with that config.

The xpath doesn't seem to find any element.

Crawler is working when I comment options_selenium.add_argument('headless') in masterspider.py line 101.

This is very weird as chromedriver is supposed to behave identically with or without headless.

PagesJaunes is known to have implemented scraping protections . This may be related.

The text was updated successfully, but these errors were encountered:

retdop · 2019-02-20T17:30:07Z

Enhancement suggestion:

add an option to start chrome as headless
in the configuration form of a spider, add the option to write your own header

See https://medium.com/@addnab/puppeteer-quick-fix-for-differences-between-headless-and-headful-versions-of-a-webpage-5b168bd5fe4a

retdop · 2019-02-20T22:26:12Z

After some research, it seems really complicated to change the header of requests in Selenium (the easiest way is to use a local proxy...).

Also, there seems to be quite a few differences between chrome and chrome headless. It may be on purpose. So the best solution would actually be to propose an option to use Firefox (geckodriver) instead of Chrome, which actually solve the problem here (tested).

retdop mentioned this issue Feb 20, 2019

added option for chrome headless #66

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scraper with Selenium config behaves differently with or without headless option #65

Scraper with Selenium config behaves differently with or without headless option #65

retdop commented Feb 20, 2019

retdop commented Feb 20, 2019 •

edited

Loading

retdop commented Feb 20, 2019

Scraper with Selenium config behaves differently with or without headless option #65

Scraper with Selenium config behaves differently with or without headless option #65

Comments

retdop commented Feb 20, 2019

retdop commented Feb 20, 2019 • edited Loading

retdop commented Feb 20, 2019

retdop commented Feb 20, 2019 •

edited

Loading