Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick "LibRegex: Only search start of line if pattern begins with ^" #25177

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nico
Copy link
Contributor

@nico nico commented Oct 26, 2024

@nico nico requested a review from alimpfard as a code owner October 26, 2024 14:32
@github-actions github-actions bot added the 👀 pr-needs-review PR needs review from a maintainer or community member label Oct 26, 2024
@nico nico marked this pull request as draft October 26, 2024 15:41
@github-actions github-actions bot removed the 👀 pr-needs-review PR needs review from a maintainer or community member label Oct 26, 2024
@nico
Copy link
Contributor Author

nico commented Oct 28, 2024

@alimpfard looks like this causes 54.298 RegexLibC(374): Failed test 'simple_notbol_noteol' in 0ms. Is that expected?

@alimpfard
Copy link
Member

The tests should pass, I guess we don't have enough tests in ladybird (or this is a posix-specific issue)

@nico
Copy link
Contributor Author

nico commented Nov 1, 2024

Want to take a look into what's going on?

(cherry picked from commit de588a97c011dbb6d4ee69bc37281870d49d3ce3)
@nico
Copy link
Contributor Author

nico commented Nov 11, 2024

Here's a standalone demo:

#include <regex.h>
#include <stddef.h>
#include <stdio.h>

#define REG_NOERR 0

#define EXPECT_EQ(a, b) printf("%d\n", a)

int main()
{
    const char pattern[] = "^hello friends$";
    const char pattern2 []= "hello friends";
    regex_t regex, regex2;

    EXPECT_EQ(regcomp(&regex, pattern, REG_EXTENDED | REG_NOSUB | REG_ICASE), REG_NOERR);
    EXPECT_EQ(regcomp(&regex2, pattern2, REG_EXTENDED | REG_NOSUB | REG_ICASE), REG_NOERR);

    EXPECT_EQ(regexec(&regex, "hello friends", 0, NULL, 0), REG_NOERR);

    EXPECT_EQ(regexec(&regex, "hello friends", 0, NULL, REG_NOTBOL), REG_NOMATCH);
    EXPECT_EQ(regexec(&regex, "hello friends", 0, NULL, REG_NOTEOL), REG_NOMATCH);
    EXPECT_EQ(regexec(&regex, "hello friends", 0, NULL, REG_NOTBOL | REG_NOTEOL), REG_NOMATCH);

    EXPECT_EQ(regexec(&regex, "a hello friends b", 0, NULL, REG_NOTBOL), REG_NOMATCH);
    EXPECT_EQ(regexec(&regex, "a hello friends", 0, NULL, REG_NOTBOL), REG_NOMATCH);
    //EXPECT_EQ(regexec(&regex, "a hello friends", 0, NULL, REG_NOTBOL | REG_SEARCH), REG_NOERR);
    //EXPECT_EQ(regexec(&regex, "a hello friends b", 0, NULL, REG_NOTBOL | REG_SEARCH), REG_NOERR);

    EXPECT_EQ(regexec(&regex, "a hello friends b", 0, NULL, REG_NOTEOL), REG_NOMATCH);
    EXPECT_EQ(regexec(&regex, "hello friends b", 0, NULL, REG_NOTEOL), REG_NOMATCH);
    //EXPECT_EQ(regexec(&regex, "hello friends b", 0, NULL, REG_NOTEOL | REG_SEARCH), REG_NOERR);
    //EXPECT_EQ(regexec(&regex, "a hello friends b", 0, NULL, REG_NOTEOL | REG_SEARCH), REG_NOMATCH);

    EXPECT_EQ(regexec(&regex, "a hello friends b", 0, NULL, REG_NOTBOL | REG_NOTEOL), REG_NOMATCH);
    //EXPECT_EQ(regexec(&regex, "a hello friends b", 0, NULL, REG_NOTBOL | REG_NOTEOL | REG_SEARCH), REG_NOMATCH);

    EXPECT_EQ(regexec(&regex2, "hello friends", 0, NULL, REG_NOTBOL), REG_NOMATCH);
    EXPECT_EQ(regexec(&regex2, "hello friends", 0, NULL, REG_NOTEOL), REG_NOMATCH);

    regfree(&regex);
    regfree(&regex2);
}

The failing lines are the commented-out ones – REG_SEARCH is a serenity extension, so it doesn't work with my system libc.

I'm confused by those lines. We pass REG_NOTBOL, so the input string doesn't start at start-of-line, but independently of that the string shouldn't match, independent of REG_SEARCH, right?

So it looks correct to me that we don't return REG_NOERR there (…or am I misreading the test?). But why did we pass it before this optimization? The commit message sounds like this is a perf optimization, not a correctness fix.

@nico
Copy link
Contributor Author

nico commented Nov 11, 2024

(…and we still return REG_NOTEOL in the REG_NOTEOL | REG_SEARCH case, which still seems wrong?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants