Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative lookahead/trailing context #181

Open
pmetzger opened this issue Apr 10, 2017 · 9 comments
Open

Negative lookahead/trailing context #181

pmetzger opened this issue Apr 10, 2017 · 9 comments

Comments

@pmetzger
Copy link
Contributor

The r / s syntax allows me to specify trailing context that isn't included in the matched expression.

What's the easiest way to specify negative trailing context, that is, a / s whose pattern, if it is present, causes r not to be be matched?

(I realize this feature might not exist as such.)

@skvadrik
Copy link
Owner

In general negative lookahead is not supported.

You can have a pair of rules:

    expression / lookahead { FAILURE }
    expression { SUCCESS }

Otherwise, if your lookahead expression is simple, it might be possible to manually invert it: e.g. inverted [a-z] is [^a-z] and so on.

Note that in re2c-0.16 trailing contexts might not work as expected (see #165). This bug is fixed in devel branch; the changes will be merged in the next release.

@skvadrik
Copy link
Owner

Of course, a pair of rules is not exactly the same as negative lookahead: the first rule may prevent shorter overlapping rules from matching.

@pmetzger
Copy link
Contributor Author

Cool. I could do what you suggest very easily, and I believe my case will not tickle the bug in question. How, though, do you reject a pattern in an action after a rule? The manual does not make it obvious. (Perhaps this is another thing that could be documented?)

@pmetzger
Copy link
Contributor Author

pmetzger commented Apr 10, 2017

Actually, I just realized that the problem with shorter overlapping rules would bite me here. This is somewhat irritating.

What I need to express is this, approximately:

[0-9]+ is an integer
[0-9]+ "." [0-9]* is a float — note that it can end in a "."
[0-9]+ ".." is not a float. (That's used in the language to express the start of a range.)

Unfortunately, it turns out that otherwise relatively obvious rules like:
[0-9]+ "." / [^.]
won't quite work because floats can have suffixes and the regexps start getting really messy, though perhaps the fact that one of those matches would be longer might save me.

@skvadrik
Copy link
Owner

This is a very common situation; I would probably do the following. First, list all lexemes explicitly:

[0-9]+ { return INT; }
[0-9]+ "." [0-9]*  { return FLOAT1; }
[0-9]+ ([+-]? [eE] [0-9]+)? { return FLOAT2; }
...
[0-9]+ ".." [0-9]+ { return RANGE; }

Then deal with each type of lexeme in its own way, depending on what exactly you are trying to achieve. If you describe your problem in more details I might suggest something more suitable.

This is how re2c can be used to lex C++: http://re2c.org/examples/example_07.html (this includes recognizing and parsing integers, floats and strings).

@pmetzger
Copy link
Contributor Author

I solved my particular problem by doing the equivalent of:
D = [0-9];
D+ "." / [^.]
for floats of the form D+ ".", with neighboring rules adjusted accordingly so this was the only way to match a float with a bare trailing dot. Since this is the shortest match for floats (and thus the lowest in the priority list) it works, though it is slightly messy.

A true negative trailing context system might be nice, but I seem to be fine without it for the moment.

Interestingly, I found some sort of weird problem when I parenthesized that expression. I have reported it separately as a bug.

@skvadrik
Copy link
Owner

Glad you solved your problem!

@pmetzger
Copy link
Contributor Author

Indeed. Though if there's a wishlist file of some sort, negative trailing context might be a nice thing to add. :) (I get that it is unlikely to show up any time soon.)

@skvadrik
Copy link
Owner

Sure, I will leave the bug open as a reminder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants