syparse is a production-ready Solidity parser written in pure Erlang. syparse is closely aligned to the Solidity project and in future will be adapted on a regular basis as the Solidity project evolves. Solidity is a high-level language whose syntax is similar to that of JavaScript and it is designed to compile to code for the Ethereum Virtual Machine. And, with the grammar.txt file the project provides the basis for the definition of the LALR grammar.
pragma solidity ^0.4.0;
contract C {
struct s { uint a; uint b; }
uint x;
mapping(uint => mapping(uint => s)) data;
}
1> {ok, {ParseTree, Tokens}} = syparse:source_to_pt("pragma solidity ^0.4.0;
contract C {
struct s { uint a; uint b; }
uint x;
mapping(uint => mapping(uint => s)) data;
}").
{ok,{{sourceUnit,
[{pragmaDirective,{identifier,"solidity"},"^0.4.0;"},
{contractDefinition,"contract",
{identifier,"C"},
[],
[{contractPart,
{structDefinition,
{identifier,"s"},
[{variableDeclaration,
{typeName,{elementaryTypeName,"uint"}},
[],
{identifier,"a"}},
{variableDeclaration,
{typeName,{elementaryTypeName,"uint"}},
[],
{identifier,"b"}}]}},
{contractPart,
{stateVariableDeclaration,
{typeName,{elementaryTypeName,"uint"}},
[],
{identifier,"x"},
[]}},
{contractPart,
{stateVariableDeclaration,
{typeName,
{mapping,
{elementaryTypeName,"uint"},
{typeName,
{mapping,{elementaryTypeName,...},{...}}}}},
[],
{identifier,"data"},
[]}}]}]},
[{'PRAGMA',1},
{'IDENTIFIER',8,"solidity"},
{'PRAGMA_DIRECTIVE',1,"^0.4.0;"},
{'CONTRACT',3},
{'IDENTIFIER',1,"C"},
{'{',3},
{'STRUCT',4},
{'IDENTIFIER',1,"s"},
{'{',4},
{'UINT',4,"uint"},
{'IDENTIFIER',1,"a"},
{';',4},
{'UINT',4,"uint"},
{'IDENTIFIER',1,"b"},
{';',4},
{'}',4},
{'UINT',5,"uint"},
{'IDENTIFIER',1,"x"},
{';',5},
{'MAPPING',6},
{'(',6},
{'UINT',6,[...]},
{'=>',6},
{'MAPPING',...},
{...}|...]}}
2> ParseTree.
{sourceUnit,
[{pragmaDirective,{identifier,"solidity"},"^0.4.0;"},
{contractDefinition,"contract",
{identifier,"C"},
[],
[{contractPart,
{structDefinition,
{identifier,"s"},
[{variableDeclaration,
{typeName,{elementaryTypeName,"uint"}},
[],
{identifier,"a"}},
{variableDeclaration,
{typeName,{elementaryTypeName,"uint"}},
[],
{identifier,"b"}}]}},
{contractPart,
{stateVariableDeclaration,
{typeName,{elementaryTypeName,"uint"}},
[],
{identifier,"x"},
[]}},
{contractPart,
{stateVariableDeclaration,
{typeName,
{mapping,
{elementaryTypeName,"uint"},
{typeName,
{mapping,
{elementaryTypeName,"uint"},
{typeName,{userDefinedTypeName,...}}}}}},
[],
{identifier,"data"},
[]}}]}]}
3> Tokens.
[{'PRAGMA',1},
{'IDENTIFIER',8,"solidity"},
{'PRAGMA_DIRECTIVE',1,"^0.4.0;"},
{'CONTRACT',3},
{'IDENTIFIER',1,"C"},
{'{',3},
{'STRUCT',4},
{'IDENTIFIER',1,"s"},
{'{',4},
{'UINT',4,"uint"},
{'IDENTIFIER',1,"a"},
{';',4},
{'UINT',4,"uint"},
{'IDENTIFIER',1,"b"},
{';',4},
{'}',4},
{'UINT',5,"uint"},
{'IDENTIFIER',1,"x"},
{';',5},
{'MAPPING',6},
{'(',6},
{'UINT',6,"uint"},
{'=>',6},
{'MAPPING',6},
{'(',6},
{'UINT',6,[...]},
{'=>',6},
{'IDENTIFIER',...},
{...}|...]
4> syparse:pt_to_source_td(ParseTree).
<<"pragma solidity ^0.4.0; contract C{struct s{uint a;uint b;} uint x; mapping(uint=>mapping(uint=>s)) data;}">>
The output of the parse tree in the Erlang shell is shortened (cause not known). The complete parse tree of the example code looks as follows:
{sourceUnit,
[{pragmaDirective,{identifier,"solidity"},"^0.4.0;"},
{contractDefinition,"contract",
{identifier,"C"},
[],
[{contractPart,
{structDefinition,
{identifier,"s"},
[{variableDeclaration,
{typeName,{elementaryTypeName,"uint"}},
[],
{identifier,"a"}},
{variableDeclaration,
{typeName,{elementaryTypeName,"uint"}},
[],
{identifier,"b"}}]}},
{contractPart,
{stateVariableDeclaration,
{typeName,{elementaryTypeName,"uint"}},
[],
{identifier,"x"},
[]}},
{contractPart,
{stateVariableDeclaration,
{typeName,
{mapping,
{elementaryTypeName,"uint"},
{typeName,
{mapping,
{elementaryTypeName,"uint"},
{typeName,{userDefinedTypeName,[{identifier,"s"}]}}}}}},
[],
{identifier,"data"},
[]}}]}]}
The documentation for syparse is available here: Wiki.
Due to deficiencies in the grammar definition of the rules PrimaryExpression
and TypeName
, the usability of the parser is still very limited.
Due to a reduce / reduce conflict between
ElementaryTypeNameExpression = ElementaryTypeName
and
TypeName = ElementaryTypeName
| UserDefinedTypeName
| Mapping
| ArrayTypeName
| FunctionTypeName
the grammar rule of ElementaryTypeNameExpression
has precedence over TypeName
.
Keywords such as 'abstract', 'address', 'after', etc. must not be used as identifiers.
NumberLiteral = '0x'? [0-9]+ (' ' NumberUnit)?
The ' ' can not be enforced with the parser tools leex and yecc.
Due to a reduce / reduce conflict between
ParameterList = '(' ( TypeName Identifier? (',' TypeName Identifier?)* )? ')'
and
TypeNameList = '(' ( TypeName (',' TypeName )* )? ')'
the grammar rule of ParameterList
has precedence over TypeNameList
.
Due to a reduce / reduce conflict between
PrimaryExpression = BooleanLiteral
| NumberLiteral
| HexLiteral
| StringLiteral
| TupleExpression
| Identifier
| ElementaryTypeNameExpression
and
UserDefinedTypeName = Identifier ( '.' Identifier )*
the grammar rule of PrimaryExpression
has precedence over UserDefinedTypeName
.
Due to a reduce / reduce conflict between
ParameterList = '(' ( TypeName Identifier? (',' TypeName Identifier?)* )? ')'
and
TypeNameList = '(' ( TypeName (',' TypeName )* )? ')'
the grammar rule of ParameterList
has precedence over TypeNameList
.
This project was inspired by the sqlparse project of the company K2 Informatics GmbH.