x

The corto/x project is a line-based regex based parser that invokes rules when their patterns match the input. The project relies heavily on the corto typesystem to automatically generate types & code for the rules. Example usecases for corto/x are:

match each line in a logfile against a set of actions
match HTTP requests against a set of routes

The format

Because regular expressions are a pain to read and write, and corto/x definitions are going to contain lots of them, the project introduces a syntax that allows you to write reusable & human-readable regular expressions. An example:

x/token id: "[a-zA-Z][a-zA-Z_0-9]*"
x/token number: "[0-9]", type=int32

x/pattern time: "{sec:number}.{nanosec:number}"
x/pattern header: "{:time} {component:id}"

x/parser myParser::
  x/rule ruleOne: "{:header} this is a line in a logfile called {name:([a-z])}"
  x/rule ruleTwo: "{:header} this is rule number two"

Here's what it means:

type	description
x/token	a reusable regular expression that you can use in `x/pattern` and `x/rule`
x/pattern	a reusable composite pattern that may consist of multiple `x/token`'s and `x/pattern`'s. Everything outside `{}` is considered plain text.
x/rule	equivalent to an `x/pattern` except that this will be matched against the input and an `x/pattern` won't. An `x/rule` must always appear in the scope of an `x/parser`.
x/parser	A collection of `x/rule`'s

And the x/pattern syntax:

syntax	description
`{name:type}`	A subexpression identified by `name` and of type `type`
`{type}`	An unnamed subexpression (data won't be available after parsing).
`{:type}`	A subexpression that will have the same name as the type. Equivalent to `{type:type}`.
`{(expr)}`	An inline regular (sub)expression
`{name:(expr)}`	A named inline regular (sub)expression

Note that a type always refers to an x/token or x/rule, never to an x/pattern.

Code generation

Think of x/token's as primitive types, and x/pattern as composite types, because that is actually what they are. The x/pattern's "time" and "header" from the example above translate into the following types:

struct time::
  sec, nanosec: int32
 
struct header::
  time: time
  component: string

If x definitions are part of a definition file, code will be generate for the tokens, patterns and rules. The rule implementations will be able to access data from the regular expression through an object that is passed as a parameter to the rule.

The optimizer

The x parser automatically optimizes itself based on the provided regular expressions. For example, if you have three rules with the following regular expressions:

rule1: "Hello [A-z]"
rule2: "Hello [A-z] times [0-9]"
rule3: "Hello!!!" the parser will build a parser tree that looks like this:

- "Hello"
  - "!!!": rule3
  - " [A-z]"
     - "": rule1
     - " times [0-9]": rule2

This ensures that common parts in regular expressions are never evaluated more than once per input line, which allows for very fast parsing. Especially projects with large numbers of regular expressions will see a big increase in performance when comparing with the naive approach (test each string against each full expression).

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
include		include
src		src
LICENSE		LICENSE
README.md		README.md
model.corto		model.corto
project.json		project.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

x

The format

Code generation

The optimizer

About

Releases

Packages

Contributors 2

Languages

License

cortoproject/corto-x

Folders and files

Latest commit

History

Repository files navigation

x

The format

Code generation

The optimizer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages