Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explicit disallow feature #36

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

SimonC-Audigent
Copy link

@SimonC-Audigent SimonC-Audigent commented Sep 17, 2024

There are some scenarios, where we want to check if the robots.txt is explicitly disallowing the UA (wildcard not included). This a common behaviour for ads crawlers such as Google AdsBots: https://developers.google.com/search/docs/crawling-indexing/robots/create-robots-txt, which will disregards wildcards.
To support that scenario, I added an explicit parameter, to the isDisallowed method.

@samclarke
Copy link
Owner

Thanks for the PR! That would be a useful feature to add.

Instead of adding it as a boolean, it might be better to make it a separate method instead. For example, the difference between:

if (robots.isDisallowed(url, 'my-ua', true)) {

and:

if (robots.isDisallowed(url, 'my-ua', false)) {

would require checking the documentation but something like:

if (robots.isExplicitlyDisallowed(url, 'my-ua')) {

is a little more obvious.

Also adding a parameter is not guaranteed to be backwards compatible. It's unlikely to break anything but there's always a possibility someone has done something like:

let results = arrayOfUserAgents.map(robots.isAllowed.bind(robots, 'some ur'))

which would be broken if added due to map passing the index as a parameter.

Copy link
Owner

@samclarke samclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a chance to look into this a bit more and added a couple of comments. It definitely seems like a feature worth adding.

Robots.js Show resolved Hide resolved
Robots.js Show resolved Hide resolved
@SimonC-Audigent
Copy link
Author

I refactored it to use isExplicitlyDisallowed and keep isDisallowed with the same signature, (which is what i implemented at the beginning and change my mind... but forgot to remove it from the PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants