-
-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for custom user agents in is_live_page() #114
Labels
Comments
drFerg
changed the title
is_live_url is sometimes failing due to user agent blocking
is_live_page is sometimes failing due to user agent blocking
Aug 29, 2024
adbar
changed the title
is_live_page is sometimes failing due to user agent blocking
Support for custom user agents in is_live_page()
Aug 29, 2024
Hi @drFerg, definitely, Trafilatura supports custom user-agent settings, courlan could also do so. The config file approach could be replicated here. Are you interested in drafting a pull request? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi!
We're currently using courlan via trafilatura for some crawling and found that when trying to do liveness checks for a hosts url we're being blocked due to user agent headers, however, we're unable to change them. I noticed there's some commented out code in the redirection test which the is_live_page uses that references user agent headers.
Is there any interest in supporting changing the headers or having a different one set?
Thanks.
The text was updated successfully, but these errors were encountered: