Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex meta char escape using preg_quote #59

Open
ranvis opened this issue Feb 23, 2016 · 4 comments
Open

Regex meta char escape using preg_quote #59

ranvis opened this issue Feb 23, 2016 · 4 comments

Comments

@ranvis
Copy link

ranvis commented Feb 23, 2016

When using preg_match('@...@'), preg_quote($rule, '@') is expected to be used to escape input.
Currently one of the following warnings occurs when a path contains some meta character:

PHP Warning: preg_match(): Compilation failed: missing ) at offset 15 in /path/to/vendor/t1gor/robots-txt-parser/source/robotstxtparser.php on line 836
PHP Warning: preg_match(): Compilation failed: unmatched parentheses at offset 1 in /path/to/vendor/t1gor/robots-txt-parser/source/robotstxtparser.php on line 836

@JanPetterMG
Copy link
Collaborator

I've seen it in some rare cases, but unfortunately never had the time to investigate it... This is indeed a bug.

@JanPetterMG
Copy link
Collaborator

Regex is not my expertise, but could this be as simple as using an non-valid URL character instead of "@"?
All of the "@"s should already be escaped as far as I can see, but I'm clearly wrong about that... It's not my code, and I don't fully understand it either, to be honest...

@ranvis
Copy link
Author

ranvis commented Feb 25, 2016

rawurlencode()ing paths as currently do, I think, is a good way, as URL may contain any char code.
But that isn't make regex escape unnecessary as it is only URL escaping.
I just took a glance at code so I may be wrong about.
Anyway sorry about being lazy not to add failing case. Tested on e1b052c.

require_once(__DIR__ . '/vendor/autoload.php');
$parser = new \RobotsTxtParser('User-agent: webcrawler
Disallow: /(
Disallow: /)
Disallow: /.
');
var_dump($parser->isAllowed('/%5C.', 'webcrawler') == true); // bool(false)
var_dump($parser->isAllowed('/(', 'webcrawler') == false); // bool(false)

@JanPetterMG
Copy link
Collaborator

I just took a look at the issue again, unable to fix it (for now), but here is something to continue on for the next person who tries to fix it...

    private function checkBasicRule($rule, $path)
    {
        $rule = $this->encode_url($rule);
        $rule = preg_quote($rule);
        // match result
        if (preg_match('@' . $rule . '@', $path)) {
            if (mb_stripos($rule, '$') !== false) {
                if (mb_strlen($rule) - 1 == mb_strlen($path)) {
                    return true;
                }
            } else {
                $this->log[] = "Rule match: Path";
                return true;
            }
        }
        return false;
    }

I'm not sure what the problem is, but I think this template is a good place to start...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants