Skip to content

Commit

Permalink
Merge pull request #42 from datalust/dev
Browse files Browse the repository at this point in the history
2.0.0 Release
  • Loading branch information
nblumhardt authored Jun 10, 2018
2 parents c3bf716 + 232d725 commit c4797db
Show file tree
Hide file tree
Showing 46 changed files with 2,534 additions and 345 deletions.
148 changes: 90 additions & 58 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,45 @@
# Superpower [![Build status](https://ci.appveyor.com/api/projects/status/7bj6if6tyc68urpy?svg=true)](https://ci.appveyor.com/project/datalust/superpower) [![Join the chat at https://gitter.im/datalust/superpower](https://img.shields.io/gitter/room/datalust/superpower.svg)](https://gitter.im/datalust/superpower) [![NuGet Version](https://img.shields.io/nuget/vpre/Superpower.svg?style=flat)](https://www.nuget.org/packages/Superpower/) [![Stack Overflow](https://img.shields.io/badge/stackoverflow-superpower-orange.svg)](http://stackoverflow.com/questions/tagged/superpower)

A [parser combinator](https://en.wikipedia.org/wiki/Parser_combinator) library based on [Sprache](https://github.com/sprache/Sprache). Superpower generates friendlier error messages through its support for token-based parsers.
A [parser combinator](https://en.wikipedia.org/wiki/Parser_combinator) library based on
[Sprache](https://github.com/sprache/Sprache). Superpower generates friendlier error messages through its support for
token-driven parsers.

![Logo](https://raw.githubusercontent.com/datalust/superpower/dev/asset/Superpower-White-200px.png)

### What is Superpower?

The job of a parser is to take a sequence of characters as input, and produce a data structure that's easier
for a program to analyze, manipulate, or transform. From this point of view, a parser is just a function from `string`
to `T` - where `T` might be anything from a simple number, a list of fields in a data format, or the abstract syntax
tree of some kind of programming language.

Just like other kinds of functions, parsers can be built by hand, from scratch. This is-or-isn't a lot of fun, depending
on the complexity of the parser you need to build (and how you plan to spend your next few dozen nights and weekends).

Superpower is a library for writing parsers in a declarative style that mirrors
the structure of the target grammar. Parsers built with Superpower are fast, robust, and report precise and
informative errors when invalid input is encountered.

### Usage

Superpower is embedded directly into your program code, without the need for any additional tools or build-time code generation tasks.
Superpower is embedded directly into your C# program, without the need for any additional tools or build-time code
generation tasks.

```shell
dotnet add package Superpower
```

The simplest parsers consume characters directly from the source text:
The simplest _text parsers_ consume characters directly from the source text:

```csharp
// Parse any number of capital 'A's in a row
var parseA = Character.EqualTo('A').AtLeastOnce();
```

The `Character.EqualTo()` method is a built-in parser. The `AtLeastOnce()` method is a _combinator_, that builds a more complex parser for a sequence of `'A'` characters out of the simple parser for a single `'A'`.
The `Character.EqualTo()` method is a built-in parser. The `AtLeastOnce()` method is a _combinator_, that builds a more
complex parser for a sequence of `'A'` characters out of the simple parser for a single `'A'`.

Superpower includes a library of simple parsers and combinators from which sophisticated parsers can be built:
Superpower includes a library of simple parsers and combinators from which more sophisticated parsers can be built:

```csharp
TextParser<string> identifier =
Expand All @@ -30,11 +52,15 @@ var id = identifier.Parse("abc123");
Assert.Equal("abc123", id);
```

Parsers are highly modular, so smaller parsers can be built and tested independently of the larger parsers built from them.
Parsers are highly modular, so smaller parsers can be built and tested independently of the larger parsers that use
them.

### Tokenization

A token-driven parser consumes elements from a list of tokens. The type used to represent the kinds of tokens consumed by a parser is generic, but currently Superpower has deeper support for `enum` tokens and using them is recommended.
Along with text parsers that consume input character-by-character, Superpower supports _token parsers_.

A token parser consumes elements from a list of tokens. A token is a fragment of the input text, tagged with the
kind of item that fragment represents - usually specified using an `enum`:

```csharp
public enum ArithmeticExpressionToken
Expand All @@ -44,7 +70,10 @@ public enum ArithmeticExpressionToken
Plus,
```

Token-driven parsing occurs in two distinct steps:
A major benefit of driving parsing from tokens, instead of individual characters, is that errors can be reported in
terms of tokens - _unexpected identifier \`frm\`, expected keyword \`from\`_ - instead of the cryptic _unexpected `m`_.

Token-driven parsing takes place in two distinct steps:

1. Tokenization, using a class derived from `Tokenizer<TKind>`, then
2. Parsing, using a function of type `TokenListParser<TKind>`.
Expand All @@ -65,63 +94,42 @@ var eval = expressionTree.Compile();
Console.WriteLine(eval()); // -> 5
```

#### Writing tokenizers
#### Assembling tokenizers with `TokenizerBuilder<TKind>`

The job of a _tokenizer_ is to split the input into a list of tokens - numbers, keywords, identifiers, operators -
while discarding irrelevant trivia such as whitespace or comments.

Superpower provides the `TokenizerBuilder<TKind>` class to quickly assemble tokenizers from _recognizers_,
text parsers that match the various kinds of tokens required by the grammar.

A simple arithmetic expression tokenizer is shown below:

```csharp
class ArithmeticExpressionTokenizer : Tokenizer<ArithmeticExpressionToken>
{
readonly Dictionary<char, ArithmeticExpressionToken> _operators =
new Dictionary<char, ArithmeticExpressionToken>
{
['+'] = ArithmeticExpressionToken.Plus,
['-'] = ArithmeticExpressionToken.Minus,
['*'] = ArithmeticExpressionToken.Times,
['/'] = ArithmeticExpressionToken.Divide,
['('] = ArithmeticExpressionToken.LParen,
[')'] = ArithmeticExpressionToken.RParen,
};

protected override IEnumerable<Result<ArithmeticExpressionToken>> Tokenize(TextSpan span)
{
var next = SkipWhiteSpace(span);
if (!next.HasValue)
yield break;

do
{
ArithmeticExpressionToken charToken;

if (char.IsDigit(next.Value))
{
var integer = Numerics.Integer(next.Location);
next = integer.Remainder.ConsumeChar();
yield return Result.Value(ArithmeticExpressionToken.Number,
integer.Location, integer.Remainder);
}
else if (_operators.TryGetValue(next.Value, out charToken))
{
yield return Result.Value(charToken, next.Location, next.Remainder);
next = next.Remainder.ConsumeChar();
}
else
{
yield return Result.Empty<ArithmeticExpressionToken>(next.Location,
new[] { "number", "operator" });
}

next = SkipWhiteSpace(next.Location);
} while (next.HasValue);
}
}
var tokenizer = new TokenizerBuilder<ArithmeticExpressionToken>()
.Ignore(Span.WhiteSpace)
.Match(Character.EqualTo('+'), ArithmeticExpressionToken.Plus)
.Match(Character.EqualTo('-'), ArithmeticExpressionToken.Minus)
.Match(Character.EqualTo('*'), ArithmeticExpressionToken.Times)
.Match(Character.EqualTo('/'), ArithmeticExpressionToken.Divide)
.Match(Character.EqualTo('('), ArithmeticExpressionToken.LParen)
.Match(Character.EqualTo(')'), ArithmeticExpressionToken.RParen)
.Match(Numerics.Natural, ArithmeticExpressionToken.Number)
.Build();
```

The tokenizer itself can use `TextParser<T>` parsers as recognizers, as in the `Numerics.Integer` example above.
Tokenizers constructed this way produce a list of tokens by repeatedly attempting to match recognizers
against the input in top-to-bottom order.

#### Writing tokenizers by hand

Tokenizers can alternatively be written by hand; this can provide the most flexibility, performance, and control,
at the expense of more complicated code. A handwritten arithmetic expression tokenizer is included in the test suite,
and a more complete example can be found [here](https://github.com/serilog/serilog-filters-expressions/blob/dev/src/Serilog.Filters.Expressions/Filters/Expressions/Parsing/FilterExpressionTokenizer.cs).
#### Writing token list parsers

Token parsers are defined in the same manner as text parsers, but consume tokens from a token list rather than characters from a string:
Token parsers are defined in the same manner as text parsers, using combinators to build up more sophisticated parsers
out of simpler ones.

```csharp
class ArithmeticExpressionParser
Expand Down Expand Up @@ -169,7 +177,9 @@ class ArithmeticExpressionParser

### Error messages

The [error scenario tests](https://github.com/datalust/superpower/blob/dev/test/Superpower.Tests/ErrorMessageScenarioTests.cs) demonstrate some of the error message formatting capabilities of Superpower. Check out the parsers referenced in the tests for some examples.
The [error scenario tests](https://github.com/datalust/superpower/blob/dev/test/Superpower.Tests/ErrorMessageScenarioTests.cs)
demonstrate some of the error message formatting capabilities of Superpower. Check out the parsers referenced in the
tests for some examples.

```csharp
ArithmeticExpressionParser.Lambda.Parse(new ArithmeticExpressionTokenizer().Tokenize("1 + * 3"));
Expand All @@ -191,7 +201,8 @@ public enum ArithmeticExpressionToken

### Performance

Superpower is built with performance as a priority. Less frequent backtracking, combined with the avoidance of allocations and indirect dispatch, mean that Superpower can be quite a bit faster than Sprache.
Superpower is built with performance as a priority. Less frequent backtracking, combined with the avoidance of
allocations and indirect dispatch, mean that Superpower can be quite a bit faster than Sprache.

Recent benchmark for parsing a long arithmetic expression:

Expand All @@ -216,6 +227,27 @@ Type=ArithmeticExpressionBenchmark Mode=Throughput

Benchmarks and results are included in the repository.

**Tips:** if you find you need more throughput: 1) consider a hand-written tokenizer, and 2) avoid the use of LINQ comprehensions and instead use chained combinators like `Then()` and especially `IgnoreThen()` - these allocate fewer delegates (closures) during parsing.

### Examples

Superpower is introduced, with a worked example, in [this blog post](https://nblumhardt.com/2016/09/superpower/).
**Example** parsers to learn from:

* [_JsonParser_](https://github.com/datalust/superpower/tree/dev/sample/JsonParser/Program.cs) is a complete, annotated
example implementing the [JSON spec](https://json.org) with good error reporting
* [_DateTimeTextParser_](https://github.com/datalust/superpower/tree/dev/sample/DateTimeTextParser) shows how Superpower's text parsers work, parsing ISO-8601 date-times
* [_IntCalc_](https://github.com/datalust/superpower/tree/dev/sample/IntCalc) is a simple arithmetic expresion parser (`1 + 2 * 3`) included in the repository, demonstrating how Superpower token parsing works
* [_Plotty_](https://github.com/SuperJMN/Plotty) implements an instruction set for a RISC virtual machine
**Real-world** projects built with Superpower:

* [_Serilog.Filters.Expressions_](https://github.com/serilog/serilog-filters-expressions) uses Superpower to implement a filtering language for structured log events
* The query language of [Seq](https://getseq.net) is implemented using Superpower
_Have an example we can add to this list? [Let us know](https://github.com/datalust/superpower/issues/new)._
### Getting help

Please post issues [to the issue tracker](https://github.com/datalust/superpower/issues), visit our [Gitter chat](https://gitter.im/datalust/superpower), or tag your [question on StackOverflow](http://stackoverflow.com/questions/tagged/superpower) with `superpower`.
Expand Down
7 changes: 7 additions & 0 deletions Superpower.sln
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "IntCalc", "sample\IntCalc\I
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "DateTimeParser", "sample\DateTimeTextParser\DateTimeParser.csproj", "{A842DA99-4EAB-423D-B532-7902FED0D8F1}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "JsonParser", "sample\JsonParser\JsonParser.csproj", "{5C9AB721-559A-4617-B990-2D9EE85BEB7C}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Expand All @@ -69,6 +71,10 @@ Global
{A842DA99-4EAB-423D-B532-7902FED0D8F1}.Debug|Any CPU.Build.0 = Debug|Any CPU
{A842DA99-4EAB-423D-B532-7902FED0D8F1}.Release|Any CPU.ActiveCfg = Release|Any CPU
{A842DA99-4EAB-423D-B532-7902FED0D8F1}.Release|Any CPU.Build.0 = Release|Any CPU
{5C9AB721-559A-4617-B990-2D9EE85BEB7C}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{5C9AB721-559A-4617-B990-2D9EE85BEB7C}.Debug|Any CPU.Build.0 = Debug|Any CPU
{5C9AB721-559A-4617-B990-2D9EE85BEB7C}.Release|Any CPU.ActiveCfg = Release|Any CPU
{5C9AB721-559A-4617-B990-2D9EE85BEB7C}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
Expand All @@ -79,6 +85,7 @@ Global
{1A9C8D7E-4DFC-48CD-99B0-63612197E95F} = {2ED926D3-7AC8-4BFD-A16B-74D942602968}
{34BBD428-8297-484E-B771-0B72C172C264} = {7533E145-1C93-4348-A70D-E68746C5438C}
{A842DA99-4EAB-423D-B532-7902FED0D8F1} = {7533E145-1C93-4348-A70D-E68746C5438C}
{5C9AB721-559A-4617-B990-2D9EE85BEB7C} = {7533E145-1C93-4348-A70D-E68746C5438C}
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {F3941419-6499-4871-BEAA-861F4FE5D2D4}
Expand Down
2 changes: 1 addition & 1 deletion sample/DateTimeTextParser/DateTimeParser.csproj
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>netcoreapp1.1</TargetFramework>
<TargetFramework>netcoreapp2.0</TargetFramework>
</PropertyGroup>
<ItemGroup>
<ProjectReference Include="..\..\src\Superpower\Superpower.csproj" />
Expand Down
112 changes: 42 additions & 70 deletions sample/DateTimeTextParser/DateTimeTextParser.cs
Original file line number Diff line number Diff line change
Expand Up @@ -4,79 +4,51 @@
using Superpower.Model;
using Superpower.Parsers;

namespace DateTimeTextParser
namespace DateTimeParser
{
static public class DateTimeTextParser
{
static TextParser<char[]> Repeat(this TextParser<char> parser, int count)
{
return input =>
{
List<char> result = new List<char>();
public static class DateTimeTextParser
{
static TextParser<int> IntDigits(int count) =>
Character.Digit
.Repeat(count)
.Select(chars => int.Parse(new string(chars)));

static TextParser<int> TwoDigits { get; } = IntDigits(2);
static TextParser<int> FourDigits { get; } = IntDigits(4);

static TextParser<char> Dash { get; } = Character.EqualTo('-');
static TextParser<char> Colon { get; } = Character.EqualTo(':');
static TextParser<char> TimeSeparator { get; } = Character.In('T', ' ');

static TextParser<DateTime> Date { get; } =
from year in FourDigits
from _ in Dash
from month in TwoDigits
from __ in Dash
from day in TwoDigits
select new DateTime(year, month, day);

static TextParser<TimeSpan> Time { get; } =
from hour in TwoDigits
from _ in Colon
from minute in TwoDigits
from second in Colon
.IgnoreThen(TwoDigits)
.OptionalOrDefault()
select new TimeSpan(hour, minute, second);

Result<char> next = input.ConsumeChar();
var beginning = next.Location;
static TextParser<DateTime> DateTime { get; } =
from date in Date
from time in TimeSeparator
.IgnoreThen(Time)
.OptionalOrDefault()
select date + time;

for (int i = 0; i < count; i++)
{
var parserResult = parser.Invoke(next.Location);
if (parserResult.HasValue)
{
result.Add(parserResult.Value);
next = next.Remainder.ConsumeChar();
}
else
return Result.Empty<char[]>(input);
}
static TextParser<DateTime> DateTimeOnly { get; } = DateTime.AtEnd();

return Result.Value(result.ToArray(), beginning, next.Location);
};
public static DateTime Parse(string input)
{
return DateTimeOnly.Parse(input);
}


static TextParser<string> TwoDigits =
Character.Digit.Repeat(2).Select(chs => new String(chs));

static TextParser<string> YearOfDate =
Character.Digit.Repeat(4).Select(chs => new String(chs));

static TextParser<string> MonthOfDate =
TwoDigits;

static TextParser<string> DayOfDate =
TwoDigits;

static TextParser<DateTime> Date =
from year in YearOfDate.Select(Int32.Parse)
from sep1 in Character.EqualTo('-')
from mon in MonthOfDate.Select(Int32.Parse)
from sep2 in Character.EqualTo('-')
from day in DayOfDate.Select(Int32.Parse)
select new DateTime(year, mon, day);

static TextParser<int> secondWithSep =
from sep in Character.EqualTo(':')
from second in TwoDigits.Select(Int32.Parse)
select second;

static TextParser<TimeSpan> Time =
from hour in TwoDigits.Select(Int32.Parse)
from sep1 in Character.EqualTo(':')
from minute in TwoDigits.Select(Int32.Parse)
from second in secondWithSep.OptionalOrDefault()
select new TimeSpan(hour, minute, second);

public static TextParser<DateTime> DateTime =
from q1 in Character.EqualTo('"').Optional()
from date in (from date in Date
from s in Character.In('T', ' ')
from time in Time
select date + time).Try()
.Or(from time in Time
select System.DateTime.Now.Date + time).Try()
.Or(Date)
from q2 in Character.EqualTo('"').Optional().AtEnd()
where (q1 == null && q2 == null) || (q1 != null && q2 != null)
select date;
}
}
}
Loading

0 comments on commit c4797db

Please sign in to comment.