Skip to content

kareman/FootlessParser

Repository files navigation

Carthage compatible

Swift 5, 4.2 and 4.1 | Swift 2.2+

FootlessParser

FootlessParser is a simple and pretty naive implementation of a parser combinator in Swift. It enables infinite lookahead, non-ambiguous parsing with error reporting.

There is a series of blog posts about the development and documentation from the source code.

Introduction

In short, FootlessParser lets you define parsers like this:

let parser = function1 <^> parser1 <*> parser2 <|> parser3

function1 and parser3 return the same type.

parser will pass the input to parser1 followed by parser2, pass their results to function1 and return its result. If that fails it will pass the original input to parser3 and return its result.

Definitions

Parser

A function which takes some input (a sequence of tokens) and returns either the output and the remaining unparsed part of the input, or an error description if it fails.

Token

A single item from the input. Like a character from a string, an element from an array or a string from a list of command line arguments.

Parser Input

Most often text, but can also be an array or really any collection of anything, provided it conforms to CollectionType.

Parsers

The general idea is to combine very simple parsers into more complex ones. So char("a") creates a parser which checks if the next token from the input is an "a". If it is it returns that "a", otherwise it returns an error. You can then use operators and functions like zeroOrMore and optional to create ever more complex parsers. For more check out the full list of functions.

Operators

<^> (map)

function <^> parser1 creates a new parser which runs parser1. If it succeeds it passes the output to function and returns the result.

<*> (apply)

function <^> parser1 <*> parser2 creates a new parser which first runs parser1. If it succeeds it runs parser2. If that also succeeds it passes both outputs to function and returns the result.

The <*> operator requires its left parser to return a function and is normally used together with <^>. function must take 2 parameters of the correct types, and it must be curried, like this:

func function (a: A) -> (B) -> C

This is because <*> returns the output of 2 parsers and it doesn't know what to do with them. If you want them returned in a tuple, an array or e.g. added together you can do so in the function before <^> .

If there are 3 parsers and 2 <*> the function must take 3 parameters, and so on.

<*

The same as the <*> above, except it discards the result of the parser to its right. Since it only returns one output it doesn't need to be used together with <^> . But you can of course if you want the output converted to something else.

*>

The same as <* , but discards the result of the parser to its left.

<|> (choice)

parser1 <|> parser2 <|> parser3

This operator tries all the parsers in order and returns the result of the first one that succeeds.

>>- (flatmap)

parser1 >>- ( o -> parser2 )

This does the same as the flatmap functions in the Swift Standard Library. It creates a new parser which first tries parser1. If it fails it returns the error, if it succeeds it passes the output to the function which uses it to create parser2. It then runs parser2 and returns its output or error.

Example

Real life usage

CSV parser

let delimiter = "," as Character
let quote = "\"" as Character
let newline = "\n" as Character

let cell = char(quote) *> zeroOrMore(not(quote)) <* char(quote)
	<|> zeroOrMore(noneOf([delimiter, newline]))

let row = extend <^> cell <*> zeroOrMore(char(delimiter) *> cell) <* char(newline)
let csvparser = zeroOrMore(row)

Here a cell (or field) either:

  • begins with a ", then contains anything, including commas, tabs and newlines, and ends with a " (both quotes are discarded)
  • or is unquoted and contains anything but a comma or a newline.

Each row then consists of one or more cells, separated by commas and ended by a newline. The extend function joins the cells together into an array. Finally the csvparser collects zero or more rows into an array.

To perform the actual parsing:

do {
    let output = try parse(csvparser, csvtext)
    // output is an array of all the rows,
    // where each row is an array of all its cells.
} catch {

}

Recursive expression

func add(a: Int) -> (Int) -> Int { return { b in a + b } }
func multiply(a: Int) -> (Int) -> Int { return { b in a + b } }

let nr = { Int($0)! } <^> oneOrMore(oneOf("0123456789"))

var expression: Parser<Character, Int>!

let factor = nr <|> lazy (char("(") *> expression <* char(")"))

var term: Parser<Character, Int>!
term = lazy (multiply <^> factor <* char("*") <*> term <|> factor)

expression = lazy (add <^> term <* char("+") <*> expression <|> term)

do {
	let result = try parse(expression, "(1+(2+4))+3")
} catch {

}

expression parses input like "12", "1+2+3", "(1+2)", "12*3+1" and "12*(3+1)" and returns the result as an Int.

All parsers which refer to themselves directly or indirectly must be pre-declared as variable implicitly unwrapped optionals (var expression: Parser<Character, Int>!). And to avoid infinte recursion the definitions must use the lazy function.

Installation

Using Carthage

github "kareman/FootlessParser"

Then run carthage update.

Then follow the installation instructions in Carthage's README.

Using CocoaPods

Add FootlessParser to your Podfile file.

pod 'FootlessParser', '~> 0.5.2'

Then run pod install to install it.