This library allows to construct backtracking top down packrat parsers in Go using parser combination. Packrat parsing enables the parsing of PEG Grammars in linear time. Parsers are combinated using the following basic parsers:
AtomParser
: Matches only a specified UTF8 stringRegexParser
: Matches a regular expressionAndParser
: Matches a given list of parsers sequentiallyOrParser
: Matches if any of a given list of parsers matchesKleeneParser
: Matches a parser 0 ton
times, optionally separated by another parserManyParser
: Matches a parser 1 ton
times, optionally separated by another parserMaybeParser
: Matches a parser 0 or 1 timesEmptyParser
: Does not read any input and matches in every caseEndParser
: Matches only if the scanner has reached the end of the input string
By default, Atom
and Regex
parsers skip (but do not match on) leading whitespace. This can be configured per parser.
If a parser matches, it returns an syntax tree *Node
. Every node points to the parser that produced it, the matched text, and a list of child nodes. AST callbacks are not provided atm, so a full syntax tree traversal is needed to process the parse results.
To construct recursive parsers, create parser combinators with nil
as the sub parser. After creating the sub parser that itself uses the parent parser, use the Set
function on the parent parser to update its children. The JSON parser provides an example for this.
This library is currently used in production, but some rarely used features may be broken. Additional documentation is ToDo.
A full example in form of a working json parser is provided in json_test.go.
import (
packrat "github.com/launix-de/go-packrat"
)
func main(){
input := "Hello World"
scanner := NewScanner(input, true)
helloParser := NewAtomParser("Hello", true)
worldParser := NewAtomParser("World", true)
helloAndWorldParser := NewAndParser(helloParser, worldParser)
n, err := Parse(helloAndWorldParser, scanner)
// n is the AST root node
// n.Children is a slice containing a node for both the Hello and World parser
}
Using this library, you can dynamically define and parse PEG grammars at runtime. Parsing time is proportional to the input length and grammar complexity. Note that if you do not need to build grammars at runtime, a parser generator like gocc will produce a static LR parser that is both faster and uses less memory than go-packrat
. If you want to use a parser combinator, worst-case exponential runtime is not a problem or low memory consumption is required, consider using goparsec. go-packrat
was written as a replacement for goparsec
that solves the time complexity issue using a simple memoization cache.
Dual licensed with custom aggreements or GPLv3. See LICENSE.