Skip to content

Railroad, Syntax diagrams, EBNF

Jesús Leganés-Combarro edited this page Dec 25, 2024 · 4 revisions

This is an extension and partially a compilation of what has been discussed here

We would like to add diagrams displaying syntax structure by parsing formal grammar definitions.

Alternatives and tools

The idea is not new, there are many different tools and alternatives

Kind Source code or link to the page Live
Visualizer https://github.com/jacquev6/DrawGrammar https://jacquev6.github.io/DrawGrammar/
Visualizer https://github.com/GuntherRademacher/rr https://bottlecaps.de/rr/ui
Visualizer http://dotnet.jku.at/applications/visualizer/
Parser https://www.icosaedro.it/bnf_chk/
Renderer https://www.npmjs.com/package/railroad-diagrams

Which standard to use

We have a couple of standards and different notations out there, so we should probably support as many as we can at once

Limitations

DrawGrammar

It does not perform left-reduction

RR

In RR 'a'+? and 'a'?+ are effectively reduced to 'a'*, and ('a'|)+ reduced to ('a'|)+ but not to the 'a'*, although they are effectively the same. It seems that RR needs like 2 steps for that reduction.

It also cannot compress a|a* although it obviously equals to a+

Concepts

AST

Instead of binary concatenation operator

graph
c1[,] --> A
c2[,] --> B
c2[,] --> C

c1 --> c2
Loading

Lets use multiple operands

graph
c1[,]
c1 --> A
c1 --> B
c1 --> C
Loading

Normalizing and compressing diagram

RR approach

Author of RR suggest to reduce AST down to two operators.

  • alternative
  • one or more

See the longer example:

rule ::= 'a'? 'b'+ 'c'* 'd'?+ 'e'?* 'f'+* ('g'|) ('h'|)+ ('i'|)* ('w'|)+* 'x' y 'z'

becomes

[rule]::= 'a'? 'b'+ 'c'* 'd'* 'e'* 'f'* 'g'? 'h'?+ 'i'?* 'w'?* 'x' [y] 'z'

How to represent repetitions

I have not come to a conclusion yet, there are some cases in this implementation that are really tricky, but some, that are relatively simple are not processed.

Currently I am trying to figure out, should "one or many" be a node or a "transition" just like empty transition.

May be it'll make something simpler, may be not. In that case we probably could use only alternative, and definition representing "list of consequent terms and non terms".

%e - epsilon, empty transition from the start state to the end state %r - repetition, empty transition from the end to the start state

x or 1 - only one x? or x|%e - zero or one x+ or x|%r - one or many x* or x|%e|%r - zero or many

Compression Rules

uncompressed => compressed

We can represent quantifiers as alternatives, or as alternatives with repetitions

a? => a | %e
a+ => a | %r
a* => a | %e | %r

Two quantifiers one after another does not meed being 'greedy' (like in regexp)

(a?)? => a | %e | %e => a?
(a?)+ => a | %e | %r) => a*
(a?)* => a | %e | %e | %r => a?

(a+)? => a | %r | %e => a*
(a+)+ => a | %r | %r => a+
(a+)* => a | %r | %r | %e => a*

// because a* contains %e and %r it always produces a*
a*? => a*
a*+ => a*
a** => a*

Alternatives can unite its elements

a | a? => a | a | %e => a?
a | a+ => a | a | %r => a+
a | a* => a | a | %e | %r => a*

Concatenation can unite its elements single element + quantifier

a a? a => a a? a
a a* a => a a+
a a+ a => a a a+

quantifier + quantifier

a? a? => a{0,2} # we do not support this
a? a? => a? a?
a? a+ => a? a+
a? a* => (a|%e) (a|%e)* => a*
a* a* => (a|%e)* (a|%e)* => a*
a+ a+ => a a* a a* => a a+
a+ a* => a+
a* a+ => a+

Unwrap and unite alternatives

(a) => a
a|(b|c) => a|b|c

Common prefixes

TODO

Recursion and reduction

We need to

  • eliminate recursion if possible
  • apply left-reduction, search for common prefixes and suffixes in long rules e.g. a b c d | a b becomes a b (c d)?
rule: 'a' rule | %e =>  rule: a*
rule ::= 'a' rule | 'a' => rule: a+

Examples

alt ::= a(b a)*

alt

alt ::= a b c d | b | c d 

alt