Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow/Disallow rules not handled correctly #76

Open
ogolovanov opened this issue Sep 2, 2016 · 1 comment
Open

Allow/Disallow rules not handled correctly #76

ogolovanov opened this issue Sep 2, 2016 · 1 comment
Labels

Comments

@ogolovanov
Copy link

From https://yandex.com/support/webmaster/controlling-robot/robots-txt.xml?lang=ru#simultaneous

The Allow and Disallow directives from the corresponding User-agent block are sorted according to URL prefix length (from shortest to longest) and applied in order. If several directives match a particular site page, the robot selects the last one in the sorted list. This way the order of directives in the robots.txt file doesn't affect how they are used by the robot.

Source robots.txt:

User-agent: Yandex
Allow: /
Allow: /catalog/auto
Disallow: /catalog

Sorted robots.txt:

User-agent: Yandex
Allow: /
Disallow: /catalog
Allow: /catalog/auto

$c = <<<ROBOTS
User-agent: *
Allow: /
Allow: /catalog/auto
Disallow: /catalog
ROBOTS;

$r = new RobotsTxtParser($c);
$url = 'http://test.ru/catalog/';
var_dump($r->isDisallowed($url));

Result: false
Expected result: true

@JanPetterMG JanPetterMG added the bug label Sep 2, 2016
@LeMoussel
Copy link

For Google this is different :

At a group-member level, in particular for allow and disallow directives, the most specific rule based on the length of the [path] entry will trump the less specific (shorter) rule. The order of precedence for rules with wildcards is undefined.

https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt?hl=en#google-supported-non-group-member-records

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants