Ambuguity and Lookahead #222
-
Hello, @alecthomas. I've been playing with participle for a while now, great work! I found that sometimes participle is unable to properly distinguish productions in set of alternatives, when they both share same prefix. Sometimes lookahead helps, sometimes it just follows the wrong node (the first suitable) and eventually fails, arbitrary long lookaheads do not help here. Intuitively I conclude, that sometimes, especially when we have a lot of nested Productions, the lookaheads at some point are already exhausted, and parser cannot distinguish proper Node from another. Does stateful lexer resets lookahead count entering the group? Is it a right thing to use Stateful lexer to help disambiguate in such nested cases? Can you please brefly describe the internal limits of disambiguation ability of participle and how does it count lookahead limit in recursive parsing? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Do you have concrete examples? |
Beta Was this translation helpful? Give feedback.
-
Sorry for the chaotic message above. Here I provide a narrowed case, in which lookahead can't help. May be you can guide me on why this is the case. package main
import (
"testing"
"github.com/alecthomas/participle/v2"
"github.com/alecthomas/participle/v2/lexer"
"github.com/alecthomas/repr"
)
// --------- lexer
var Lexer = lexer.MustSimple([]lexer.SimpleRule{
{"Reserved", `(?i)(and|or)`},
{"Number", `(\d*\.)?\d+`},
{"Ident", `[a-zA-Z_]\w*`},
{`Punct`, `[+-/*(),><=]`},
{"whitespace", `\s+`},
})
// --------- grammar
type Literal struct {
Number *string `@Number`
}
type Call struct {
Func string `@Ident`
Args []*Term `"(" ( @@ ( "," @@ )* )? ")"`
}
type Term struct {
Literal *Literal ` @@`
Call *Call `| @@`
Variable *string `| @Ident`
}
type Comparison struct {
Left *Term `@@`
Op string `@( "="|"<" "="| "<" |">" "="| ">" )`
Right *Term `@@`
}
type ConditionTerm struct {
Call *Call ` @@`
Comparison *Comparison `| @@ `
}
type OpConditionTerm struct {
Op string `@("and"|"or")`
ConditionTerm *ConditionTerm `@@`
}
type Condition struct {
First *ConditionTerm `@@`
Rest []*OpConditionTerm `@@*`
}
// --------- test runner
func testParser(t *testing.T, target interface{}, lookahead int, inputs []string) {
// create parser
basicParser := participle.MustBuild(target,
participle.Lexer(Lexer),
participle.UseLookahead(lookahead),
participle.CaseInsensitive("Reserved"),
)
// run cases
for _, input := range inputs {
t.Run(input, func(t *testing.T) {
err := basicParser.ParseString("", input, target)
if err != nil {
// print so-far parsed AST before failing
repr.Println(target)
t.Fatal(err)
}
})
}
}
// ---------- this test succeeds
func TestComparison(t *testing.T) {
testParser(t, &Comparison{}, 1, []string{
`X=Y(1)`, // good
`b(x)=1`, // good
})
}
// ---------- this test fails to distinguish 'b(x)' (CALL) from 'b(x)=1' (COMPARISON)
func TestCondition(t *testing.T) {
testParser(t, &Condition{}, 100, []string{
`X=Y(1) or Good()`, // good
`b(x)=1`, // FAIL! but string is same to TestComparison() case, which succeeds
})
} The test output: go test
Partially parsed AST:
&main.Condition{
First: &main.ConditionTerm{
Call: &main.Call{
Func: "b",
Args: []*main.Term{
{
Variable: &"x",
},
},
},
},
}
--- FAIL: TestCondition (0.00s)
--- FAIL: TestCondition/b(x)=1 (0.00s)
parser_test.go:74: 1:5: unexpected token "=" |
Beta Was this translation helpful? Give feedback.
This is correctly failing, AFAICT, because Condition -> ConditionTerm and ConditionTerm has the following:
The
Call
matches, but notComparison
which is what would allow=
to be parsed correctly. So basically your grammar is not structured correctly for the input.