Skip to content

A simple but effective parser combinators library written is Typescript

License

Notifications You must be signed in to change notification settings

salty-max/parsil

Repository files navigation

Parsil

Build Status npm Version License

Description

Parsil is a lightweight and flexible parser combinators library for JavaScript and TypeScript. It provides a set of composable parsers that allow you to build complex parsing logic with ease.

Key Features:

  • Composable parsers for building complex parsing logic
  • Support for error handling and error reporting
  • Extensive library of predefined parsers for common parsing tasks
  • Flexible and expressive API for defining custom parsers
  • Well-documented and easy to use

Release notes

v1.1.0

v1.2.0

v1.3.0

  • Improved type inference in choice, sequenceOf and exactly parsers using variadic generics from Typescript 4.X

v1.4.0

v1.5.0

v1.6.0

Table of contents

Installation

Install Parsil using npm:

npm install parsil

Usage

import P from 'your-library-name';

// Define parsers
const digitParser = P.digits();
const letterParser = P.letters();
const wordParser = P.manyOne(letterParser);

// Parse input
const input = 'Hello123';
const result = wordParser.parse(input);

if (result.isSuccess) {
  console.log('Parsing succeeded:', result.value);
} else {
  console.error('Parsing failed:', result.error);
}

API

Methods

.run

.run starts the parsing process on an input, (which may be a string, TypedArray, ArrayBuffer, or DataView), initializes the state, and returns the result of parsing the input using the parser.

Example

str('hello').run('hello')
// -> {
//      isError: false,
//      result: "hello",
//      index: 5
//    }

.fork

Takes an input to parse, and two functions to handle the results of parsing:

  • an error function that is called when parsing fails
  • a success function that is called when parsing is successful.

The fork method will run the parser on the input and, depending on the outcome, call the appropriate function.

Example

str('hello').fork(
  'hello',
  (errorMsg, parsingState) => {
    console.log(errorMsg);
    console.log(parsingState);
    return "goodbye"
  },
  (result, parsingState) => {
    console.log(parsingState);
    return result;
  }
);
// [console.log] Object {isError: false, error: null, target: "hello", index: 5, …}
// -> "hello"

str('hello').fork(
  'farewell',
  (errorMsg, parsingState) => {
    console.log(errorMsg);
    console.log(parsingState);
    return "goodbye"
  },
  (result, parsingState) => {
    console.log(parsingState);
    return result;
  }
);
// [console.log] ParseError @ index 0 -> str: Expected string 'hello', got 'farew...'
// [console.log] Object {isError: true, error: "ParseError @ index 0 -> str: Expected string 'hello',…", target: "farewell", index: 0, …}
// "goodbye"

.map

.map transforms the parser into a new parser that applies a function to the result of the original parser.

Example

const newParser = letters.map(x => ({
  matchType: 'string',
  value: x
});

newParser.run('hello world')
// -> {
//      isError: false,
//      result: {
//        matchType: "string",
//        value: "hello"
//      },
//      index: 5,
//    }

.chain

.chain transforms the parser into a new parser by applying a function to the result of the original parser.

This function should return a new Parser that can be used to parse the next input.

This is used for cases where the result of a parser is needed to decide what to parse next.

Example

const lettersThenSpace = sequenceOf([
  letters,
  char(' ')
]).map(x => x[0]);

const newParser = lettersThenSpace.chain(matchedValue => {
  switch (matchedValue) {
    case 'number': return digits;

    case 'string': return letters;

    case 'bracketed': return sequenceOf([
      char('('),
      letters,
      char(')')
    ]).map(values => values[1]);

    default: return fail('Unrecognised input type');
  }
});

.errorMap

.errorMap is like .map but it transforms the error value. The function passed to .errorMap gets an object the current error message (error) and the index (index) that parsing stopped at.

Example

const newParser = letters.errorMap(({error, index}) => `Old message was: [${error}] @ index ${index}`);

newParser.run('1234')
// -> {
//      isError: true,
//      error: "Old message was: [ParseError @ index 0 -> letters: Expected letters] @ index 0",
//      index: 0,
//    }

Functions

anyChar

anyChar matches exactly one utf-8 character.

Example

anyChar.run('a')
// -> {
//      isError: false,
//      result: "a",
//      index: 1,
//    }

anyChar.run('😉')
// -> {
//      isError: false,
//      result: "😉",
//      index: 4,
//    }

anyCharExcept

anyCharExcept takes a exception parser and returns a new parser which matches exactly one character, if it is not matched by the exception parser.

Example

anyCharExcept (char ('.')).run('This is a sentence.')
// -> {
//   isError: false,
//   result: 'T',
//   index: 1,
//   data: null
// }

const manyExceptDot = many (anyCharExcept (char ('.')))
manyExceptDot.run('This is a sentence.')
// -> {
//      isError: false,
//      result: ['T', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 's', 'e', 'n', 't', 'e', 'n', 'c', 'e'],
//      index: 18,
//      data: null
//    }

bit

bit parses a bit at index from a Dataview

Example

const parser = bit
const data = new Uint8Array([42]).buffer
parser.run(new Dataview(data))
// -> {
//      isError: false,
//      result: 0,
//      index: 1,
//    }

between

between takes 3 parsers, a left parser, a right parser, and a value parser, returning a new parser that matches a value matched by the value parser, between values matched by the left parser and the right parser.

This parser can easily be partially applied with char ('(') and char (')') to create a betweenRoundBrackets parser, for example.

Example

const newParser = between (char ('<')) (char ('>')) (letters);

newParser.run('<hello>')
// -> {
//      isError: false,
//      result: "hello",
//      index: 7,
//    }

const betweenRoundBrackets = between (char ('(')) (char (')'));

betweenRoundBrackets (many (letters)).run('(hello world)')
// -> {
//      isError: true,
//      error: "ParseError @ index 6 -> between: Expected character ')', got ' '",
//      index: 6,
//    }

char

char takes a character and returns a parser that matches that character exactly one time.

Example

char ('h').run('hello')
// -> {
//      isError: false,
//      result: "h",
//      index: 1,
//    }

choice

choice is a parser combinator that tries each parser in a given list of parsers, in order, until one succeeds.

If a parser succeeds, it consumes the relevant input and returns the result.

If no parser succeeds, choice fails with an error message.

Example

const newParser = choice ([
  digit,
  char ('!'),
  str ('hello'),
  str ('pineapple')
])

newParser.run('hello world')
// -> {
//      isError: false,
//      result: "hello",
//      index: 5,
//    }

coroutine

coroutine is a parser that allows for advanced control flow and composition of parsers.

Example

const parserFn: ParserFn<number> = (yield) => {
  const x = yield(parserA);
  const y = yield(parserB);
  return x + y;
};
 *
const coroutineParser = coroutine(parserFn);
coroutineParser.run(input);

digit

digit is a parser that matches exactly one numerical digit /[0-9]/.

Example

digit.run('99 bottles of beer on the wall')
// -> {
//      isError: false,
//      result: "9",
//      index: 1,
//    }

digits

digits matches one or more numerical digit /[0-9]/.

Example

digits.run('99 bottles of beer on the wall')
// -> {
//      isError: false,
//      result: "99",
//      index: 2,
//    }

endOfInput

endOfInput is a parser that only succeeds when there is no more input to be parsed.

Example

const newParser = sequenceOf ([
  str ('abc'),
  endOfInput
]);

newParser.run('abc')
// -> {
//      isError: false,
//      result: [ "abc", null ],
//      index: 3,
//      data: null
//    }

newParser.run('')
// -> {
//      isError: true,
//      error: "ParseError @ index 0 -> endOfInput: Expecting string 'abc', but got end of input.",
//      index: 0,
//      data: null
//    }

everyCharUntil

everyCharUntil takes a termination parser and returns a new parser which matches every possible character up until a value is matched by the termination parser. When a value is matched by the termination parser, it is not "consumed".

Example

everyCharUntil (char ('.')).run('This is a sentence.This is another sentence')
// -> {
//      isError: false,
//      result: 'This is a sentence',
//      index: 18,
//      data: null
//    }

// termination parser doesn't consume the termination value
const newParser = sequenceOf ([
  everyCharUntil (char ('.')),
  str ('This is another sentence')
]);

newParser.run('This is a sentence.This is another sentence')
// -> {
//      isError: true,
//      error: "ParseError (position 18): Expecting string 'This is another sentence', got '.This is another sentenc...'",
//      index: 18,
//      data: null
//    }

everythingUntil

everythingUntil takes a termination parser and returns a new parser which matches every possible numerical byte up until a value is matched by the termination parser. When a value is matched by the termination parser, it is not "consumed".

Example

everythingUntil (char ('.')).run('This is a sentence.This is another sentence')
// -> {
//      isError: false,
//      result: [84, 104, 105, 115, 32, 105, 115, 32, 97, 32, 115, 101, 110, 116, 101, 110, 99, 101],
//      index: 18,
//      data: null
//    }

// termination parser doesn't consume the termination value
const newParser = sequenceOf ([
  everythingUntil (char ('.')),
  str ('This is another sentence')
]);

newParser.run('This is a sentence.This is another sentence')
// -> {
//      isError: true,
//      error: "ParseError (position 18): Expecting string 'This is another sentence', got '.This is another sentenc...'",
//      index: 18,
//      data: null
//    }

exactly

exactly takes a positive number and returns a function. That function takes a parser and returns a new parser which matches the given parser the specified number of times.

Example

const newParser = exactly (4)(letter)

newParser.run('abcdef')
// -> {
//      isError: false,
//      result: [ "a", "b", "c", "d" ],
//      index: 4,
//      data: null
//    }

newParser.run('abc')
// -> {
//      isError: true,
//      error: 'ParseError @ index 0 -> exactly: Expecting 4 letter, but got end of input.',
//      index: 0,
//      data: null
//    }

newParser.run('12345')
// -> {
//      isError: true,
//      error: 'ParseError @ index 0 -> exactly: Expecting 4 letter, but got '1'',
//      index: 0,
//      data: null
//    }

fail

fail takes an error message string and returns a parser that always fails with the provided error message.

Example

fail('Nope').run('hello world')
// -> {
//      isError: true,
//      error: "Nope",
//      index: 0,
//    }

int

int reads the next n bits from the input and interprets them as an signed integer.

Example

const parser = int(8)
const input = new Uint8Array([-42])
const result = parser.run(new DataView(input.buffer))
// -> {
//      isError: false,
//      result: -42,
//      index: 8,
//    }

letter

letter is a parser that matches exactly one alphabetical letter /[a-zA-Z]/.

Example

letter.run('hello world')
// -> {
//      isError: false,
//      result: "h",
//      index: 1,
//    }

letters

letters matches one or more alphabetical letter /[a-zA-Z]/.

Example

letters.run('hello world')
// -> {
//      isError: false,
//      result: "hello",
//      index: 5,
//    }

lookAhead

lookAhead takes look ahead parser, and returns a new parser that matches using the look ahead parser, but without consuming input.

Example

const newParser = sequenceOf ([
  str ('hello '),
  lookAhead (str ('world')),
  str ('wor')
]);

newParser.run('hello world')
// -> {
//      isError: false,
//      result: [ "hello ", "world", "wor" ],
//      index: 9,
//      data: null
//    }

many

many is a parser combinator that applies a given parser zero or more times.

It collects the results of each successful parse into an array, and stops when the parser can no longer match the input.

It doesn't fail when the parser doesn't match the input at all; instead, it returns an empty array.

Example

const newParser = many (str ('abc'))

newParser.run('abcabcabcabc')
// -> {
//      isError: false,
//      result: [ "abc", "abc", "abc", "abc" ],
//      index: 12,
//    }

newParser.run('')
// -> {
//      isError: false,
//      result: [],
//      index: 0,
//    }

newParser.run('12345')
// -> {
//      isError: false,
//      result: [],
//      index: 0,
//    }

manyOne

manyOne is similar to many, but it requires the input parser to match the input at least once.

Example

const newParser = many1 (str ('abc'))

newParser.run('abcabcabcabc')
// -> {
//      isError: false,
//      result: [ "abc", "abc", "abc", "abc" ],
//      index: 12,
//    }

newParser.run('')
// -> {
//   isError: true,
//   error: "ParseError @ index 0 -> manyOne: Expected to match at least one value",
//   index: 0,
//   data: null
// }

newParser.run('12345')
// -> {
//   isError: true,
//   error: "ParseError @ index 0 -> manyOne: Expected to match at least one value",
//   index: 0,
//   data: null
// }

one

one parses bit at index from a Dataview and expects it to be 1

Example

const parser = one
const data = new Uint8Array([234]).buffer
parser.run(new Dataview(data))
// -> {
//      isError: false,
//      result: 1,
//      index: 1,
//    }
const data = new Uint8Array([42]).buffer
parser.run(new Dataview(data))
// -> {
//      isError: true,
//      error: "ParseError @ index 0 -> one: Expected 1 but got 0",
//      index: 0,
//    }

optionalWhitespace

optionalWhitespace is a parser that matches zero or more whitespace characters.

Example

const newParser = sequenceOf ([
  str ('hello'),
  optionalWhitespace,
  str ('world')
]);

newParser.run('hello           world')
// -> {
//      isError: false,
//      result: [ "hello", "           ", "world" ],
//      index: 21,
//    }

newParser.run('helloworld')
// -> {
//      isError: false,
//      result: [ "hello", "", "world" ],
//      index: 10,
//    }

peek

peek matches exactly one numerical byte without consuming any input.

Example

peek.run('hello world')
// -> {
//      isError: false,
//      result: 104,
//      index: 0,
//      data: null
//    }

sequenceOf([
  str('hello'),
  peek
]).run('hello world')
// -> {
//      isError: false,
//      result: [ "hello", 32 ],
//      index: 5,
//      data: null
//    }

possibly

possibly takes an attempt parser and returns a new parser which tries to match using the attempt parser. If it is unsuccessful, it returns a null value and does not "consume" any input.

Example

const newParser = sequenceOf ([
  possibly (str ('Not Here')),
  str ('Yep I am here')
]);

newParser.run('Yep I am here')
// -> {
//      isError: false,
//      result: [ null, "Yep I am here" ],
//      index: 13,
//    }

rawString

rawString matches a string of characters exactly as provided.

Each character in the input string is converted to its corresponding ASCII code and a parser is created for each ASCII code.

The resulting parsers are chained together using sequenceOf to ensure they are parsed in order.

The parser succeeds if all characters are matched in the input and fails otherwise.

Example

const parser = rawString('Hello')
parser.run('Hello')
// -> {
//      isError: false,
//      result: [72, 101, 108, 108, 111],
//      index: 40,
//    }
parser.run('World')
// -> {
//      isError: true,
//      error: "ParseError -> rawString: Expected character H, but got W",
//      index: 8,
//    }

recursive

recursive takes a function that returns a parser (a thunk), and returns that same parser. This is needed in order to create recursive parsers because JavaScript is an eager language.

In the following example both the value parser and the matchArray parser are defined in terms of each other, so one must be one must be defined using recursive.

Example

const value = recursiveParser (() => choice ([
  matchNum,
  matchStr,
  matchArray
]));

const betweenSquareBrackets = between (char ('[')) (char (']'));
const commaSeparated = sepBy (char (','));
const spaceSeparated = sepBy (char (' '));

const matchNum = digits;
const matchStr = letters;
const matchArray = betweenSquareBrackets (commaSeparated (value));

spaceSeparated(value).run('abc 123 [42,def] 45')
// -> {
//      isError: false,
//      result: [ "abc", "123", [ "42", "def" ], "45" ],
//      index: 29,
//    }

regex

regex takes a RegExp and returns a parser that matches as many characters as the RegExp matches.

Example

regex(/^[hH][aeiou].{2}o/).run('hello world')
// -> {
//      isError: false,
//      result: "hello",
//      index: 5,
//    }

sepBy

sepBy takes two parsers - a separator parser and a value parser - and returns a new parser that matches zero or more values from the value parser that are separated by values of the separator parser. Because it will match zero or more values, this parser will fail if a value is followed by a separator but NOT another value. If there's no value, the result will be an empty array, not failure.

Example

const newParser = sepBy (char (',')) (letters)

newParser.run('some,comma,separated,words')
// -> {
//      isError: false,
//      result: [ "some", "comma", "separated", "words" ],
//      index: 26,
//    }

newParser.run('')
// -> {
//      isError: false,
//      result: [],
//      index: 0,
//    }

newParser.run('12345')
// -> {
//      isError: false,
//      result: [],
//      index: 0,
//    }

sepByOne

sepByOne is the same as sepBy, except that it matches one or more occurence.

Example

const newParser = sepByOne(char (','))(letters)

newParser.run('some,comma,separated,words')
// -> {
//      isError: false,
//      result: [ "some", "comma", "separated", "words" ],
//      index: 26,
//    }

newParser.run('1,2,3')
// -> {
//      isError: true,
//      error: "ParseError @ index0 -> sepByOne: Expected to match at least one separated value",
//      index: 0,
//    }

sequenceOf

sequenceOf is a parser combinator that accepts an array of parsers and applies them in sequence to the input. If all parsers succeed, it returns an array of their results.

If any parser fails, it fails immediately and returns the error state of that parser.

Example

const newParser = sequenceOf ([
  str ('he'),
  letters,
  char (' '),
  str ('world'),
])

newParser.run('hello world')
// -> {
//      isError: false,
//      result: [ "he", "llo", " ", "world" ],
//      index: 11,
//    }

startOfInput

startOfInput is a parser that only succeeds when the parser is at the beginning of the input.

Example

const mustBeginWithHeading = sequenceOf([
    startOfInput,
    str("# ")
  ]);
const newParser = between(mustBeginWithHeading)(endOfInput)(everyCharUntil(endOfInput));

newParser.run('# Heading');
// -> {
//      isError: false,
//      result: "# Heading",
//      index: 9,
//      data: null
//    }

newParser.run(' # Heading');
// -> {
//      isError: true,
//      error: "ParseError @ index 0 -> startOfInput: Expecting string '# ', got ' #...'",
//      index: 0,
//      data: null
//    }

succeed

succeed is a parser combinator that always succeeds and produces a constant value. It ignores the input state and returns the specified value as the result.

Example

const parser = succeed(42);
parser.run("hello world");
// Returns:
// {
//   isError: false,
//   result: 42,
//   index: 0
// }

str

str tries to match a given string against its input.

Example

str('hello').run('hello world')
// -> {
//      isError: false,
//      result: "hello",
//      index: 5,
//    }

uint

uint reads the next n bits from the input and interprets them as an unsigned integer.

Example

const parser = uint(8)
const input = new Uint8Array([42])
const result = parser.run(new DataView(input.buffer))
// -> {
//      isError: false,
//      result: 42,
//      index: 8,
//    }

whitespace

whitespace is a parser that matches one or more whitespace characters.

Example

const newParser = sequenceOf ([
  str ('hello'),
  whitespace,
  str ('world')
]);

newParser.run('hello           world')
// -> {
//      isError: false,
//      result: [ "hello", "           ", "world" ],
//      index: 21,
//    }

newParser.run('helloworld')
// -> {
//      isError: true,
//      error: "ParseError 'many1' (position 5): Expected to match at least one value",
//      index: 5,
//    }

zero

zero parses bit at index from a Dataview and expects it to be 0

Example

const parser = zero
const data = new Uint8Array([42]).buffer
parser.run(new Dataview(data))
// -> {
//      isError: false,
//      result: 0,
//      index: 1,
//    }
const data = new Uint8Array([234]).buffer
parser.run(new Dataview(data))
// -> {
//      isError: true,
//      error: "ParseError @ index 0 -> zero: Expected 0 but got 1",
//      index: 0,
//    }