Skip to content
Grigore Rosu edited this page Aug 19, 2013 · 13 revisions

Design proposal for standardizing the basic representation of KIL for programs and definitions

#Introduction

The objective of this design proposal is to separate concerns even further in the design and implementation of the K tool.

This proposal is about the syntax of core K, which for simplicity I'm going to call Kore. Recall that K in its full generality requires a two-phase parser: one parsing the syntax declarations and generating a parser for the rules (or contexts, macros, etc.), and the other parsing the rules using the parser generated in phase one. Kore is a fragment of K meant to be very easy to parse, eliminating the need for a two-phase parser. From a user perspective, the difference between K and Kore is that instead of writing terms which need a concrete-syntax parser to parse, such as A + B, one writes terms using the KAST syntax, such as '_+_(A,B).

The idea is that Kore should be nothing but a textual reflection of the KIL of a K definition, the same way the KAST of a program is nothing but a textual reflection of the KIL of that program. We are going to have a trivial "loader" of a K definition in Kore format into KIL data-structures, which in principle is a parser, but a very simple one, infinitely simpler than the current K parser that Radu is working on.

There are LOTS of advantages for doing this. I'll only mention the beautiful separation of concerns that we will achieve, where the complex K parser is completely separated out from the kompile tool, so both of them can then be designed and improved in isolation. Further, I hope that the syntax of Kore will be simple enough that we will not need to change it as we change the support for parsing in K, so middle-end and back-end developers will not depend on other crazy syntactic tricks that we may apply to the front-end (e.g., sort inferencers, syntactic conventions for builtins like using X|->I in cells, etc.).

Whatever I say next can and will be obviously changed/improved, so feel free to comment or fight for your point. Actually, I really hope that this message will start you on to say all the things that you wanted to say about this topic. Keep in mind, though, that we want a fragment of K here, which should be trivial to parse.

#GeneriK

Let us start with the generic syntax of K, that is, the K syntax for defining language syntax and semantics which is independent upon the particular language that is being defined. For simplicity, we refer to this syntax as GeneriK. At this level there is only one generic sort for all syntax, K, which has no constructs. A parser for GeneriK would put all the K terms into "bubbles":

syntax String ::= usual double-quoted strings
syntax NAME ::= any sequences of characters; each context may further restrict them (e.g., variables and sorts start with capital letter, etc.)
syntax Comment ::= usual comments (/* multiline */ and // one line)

syntax K   // GeneriK does not care about the concrete or abstract syntax; all syntactic terms are parsed as K "bubbles" at this stage

syntax Definition ::= List{Module | Comment}{""}   // maybe this notation should be used for non-empty lists?
syntax Module ::= "module" NAME ModuleBody "endmodule"
syntax ModuleBody ::= List{Sentence | Comment}{""}{""}
syntax Sentence ::= Import | Configuration | Syntax | Context | Rule

syntax Import ::= "import" NAME
syntax Configuration ::= "configuration" K OptionalAttributes   // cells and configurations are represented as K terms, too
syntax Syntax ::= "syntax" Sort OptionalAttributes   // sort declarations, possibly with attributes
                | "syntax" Sort "::=" List{Production}{"|"}
syntax Sort ::= NAME
syntax Production ::= List{String | Sort}{""} OptionalAttributes
syntax Context ::= "context" K OptionalAttributes
syntax Rule ::= "rule" K OptionalAttributes

syntax OptionalAttributes ::= "" | "[" Attributes "]"
syntax Attributes ::=  List{Attribute}{","}{""}
syntax Attribute ::= NAME OptionalNAMES
syntax OptionalNAMES ::= "" | "(" NAMES ")"
syntax NAMES ::= List{NAME}{","}

The K terms marked as "bubbles" by the GeneriK parser need to be further parsed with specialized parsers (e.g., the Kore parser, or the parser for Full K, etc.; see next). Note that each such K term bubble above appears in a construct that also allows optional attributes. One of those attributes can, for example, state the specific parser to be used to parse the corresponding K term bubble (in case the default one, probably the one for Full K, generates ambiguities).

Important note: The GeneriK parser should be as simple and efficient as possible. It should not attempts to parse any K terms at all and it should give very nice error messages. Any other subsequent parsers for K will only be concerned with parsing the K term bubbles (possibly in parallel).

#Kore

Kore adds to GeneriK a more specific syntax for K terms, but one which is still very easy to parse (without requiring 2 phase parsing). Specifically, it adds to GeneriK the following: KAST syntax; support for variables, sort memberships and cells; and the rewrite relation => and the HOLE.

##KAST

KAST is meant to represent any ground K terms (including ones corresponding to programs in AST format, obtained by removing the ~> construct):

syntax K ::= List{KItem}{"~>"}{".K"}                            // generic
syntax KItem ::= KToken  |  KLabel "(" KList ")"                // generic
syntax KList ::= List{K}{","}{".KList"}                         // generic

syntax KToken ::= "#" String   // tokens start with # and are immediately (no space) followed by a  string; e.g., #"abc", #"735", etc.
syntax KLabel ::= NAME   // ones starting with single-quote are generated automatically from productions

##Variables and Sort Memberships

Adding support for sorting/casting and variables:

syntax Variable ::= NAME                  // possibly restricted to start with capital letter

syntax K      ::= Variable
                | K ":" Sort              // sorting arbitrary K terms
                | "(" K ")"   [bracket]
syntax KItem  ::= Variable ":" "KItem"    // should have priority over the above
syntax KList  ::= Variable ":" "KList"
syntax KToken ::= Variable ":" "KToken"
syntax KLabel ::= Variable ":" "KLabel"

##Cells and Configurations

In the long run we want cells and configurations to be embedded into K the same way we decided to embed the builtins (integers, maps, etc.). Probably <k> T </k> will become something of the form '<k>_</k>(T). However, for now we decided to make a notational exception and allow cells to be written in an XML-ish notation in Kore (this is orthogonal to how cells are represented in KIL):

syntax K ::= Cells
syntax Cells ::= List{Cell}{""}
syntax Cell ::= "<" NAME OptionalTags ">" K "</" NAME ">"   // the two NAME should be the same
syntax OptionalTags ::= List{NAME "=" String}{""}{""}

##Rewriting and HOLE

syntax K ::= KList "=>" KList   // should be greedy
syntax K ::= "HOLE"   // further restricted to only appear in certain contexts

#Example: IMP in Kore

Below is IMP defined in the Kore language above, including all the builtins that it needs.

###IMP Syntax and Builtins

We can define the syntax either using sorts and productions, which are easier to read and hold more information that may be useful in some backends, or using only KLabels, which is more minimal but also loses sort information.

Here is the syntax using sorts and productions. This is almost identical to the IMP syntax in full K, except for purely parsing-related features, such as bracket productions, operation priorities, prefer and avoid attributes, etc.:

{{{ module BUILTINS sort "Int" [builtin(Integer)] sort "Bool" [builtin(Boolean)] sort "Id" [builtin(Identifier)] syntax Int ::= Int "/Int" Int [builtin(IntegerDivision)] | Int "+Int" Int [builtin(IntegerAddition)] syntax Bool ::= Int "<=Int" Int [builtin(IntegerLeq)] | Int "=/=Int" Int [builtin(IntegerNeq)] | "notBool" Bool [builtin(BoolNot)] syntax Map ::= ".Map" [builtin(MapEmpty)] | lookup(Map) [builtin(MapLookup)] | update(Map,K,K) [builtin(MapUpdate)] | addBinding(Map,K,K) [builtin(MapAddBinding)] syntax Bool ::= K "in" Map [builtin(MapMembership)] syntax K ::= keys(Map) [builtin(MapGetKeys)] endmodule

module IMP-SYNTAX imports BUILTINS syntax AExp ::= Int | Id | AExp "/" AExp [strict] | AExp "+" AExp [strict] syntax BExp ::= Bool | AExp "<=" AExp [seqstrict] | "!" BExp [strict] | BExp "&&" BExp [strict(1)] syntax Block ::= "{" "}" | "{" Stmt "}" syntax Stmt ::= Block | Id "=" AExp ";" [strict(2)] | "if" "(" BExp ")" Block "else" Block [strict(1)] | "while" "(" BExp ")" Block | Stmt Stmt syntax Pgm ::= "int" Ids ";" Stmt syntax Ids ::= List{Id}{","}{".Ids"} endmodule }}}

Here is the KLabel alternative of the syntax above, which may be prefered in some cases:

{{{ module BUILTINS syntax KLabel ::= "Int" [builtin(Integer)] // can be added automatically, from 'sort "Int" [builtin(Integer)]' syntax KLabel ::= "Bool" [builtin(Boolean)] syntax KLabel ::= "Id" [builtin(Identifier)] syntax KLabel ::= "'/Int" [builtin(IntegerDivision)] | "'+Int" [builtin(IntegerAddition)] | "'<=Int" [builtin(IntegerLeq)] | "'=/=Int" [builtin(IntegerNeq)] | "'notBool_" [builtin(BoolNot)] | "'.Map" [builtin(MapEmpty)] | "'lookup_" [builtin(MapLookup)] | "'update_" [builtin(MapUpdate)] | "'addBinding_" [builtin(MapAddBinding)] | "'in" [builtin(MapMembership)] | "'keys_" [builtin(MapGetKeys)] endmodule

module IMP-SYNTAX imports BUILTINS syntax KLabel ::= "'/" [strict] | "'+" [strict] | "'<=" [seqstrict] | "'!" [strict] | "'&&" [strict(1)] | "'{}" | "'{}" | "'=;" [strict(2)] | "'if(_)then_" [strict(1)] | "'while(_)" | "'__" | "'int;" | "',_" | "'.Ids" endmodule }}}

###IMP Semantics

Below I'm allowing unsorted variables, with the simple and locally resolved semantics that they have the sort K.

{{{ module IMP imports IMP-SYNTAX rule isKResult(I:Int) => Bool(#"true") rule isKResult(B:Bool) => Bool(#"true")

configuration $PGM:K '.Map(.KList)

rule X:Id => 'lookup(Sigma,X:Id) ... Sigma rule '/(I1:Int,I2:Int) => '/Int(I1:Int,I2:Int) when '=/=Int(I2:Int,Int(#"0")) rule '+(I1:Int,I2:Int) => '+(I1:Int,I2:Int) rule '<=(I1:Int,I2:Int) => '<=(I1:Int,I2:Int) rule '!(T:Bool) => 'notBool(T:Bool) rule '&&(Bool(#"true"),B) => B rule '&&(Bool(#"false"),) => Bool(#"false") rule '{}(.KList) => .K [structural] rule '{}(S) => S [structural] rule '=;(X:Id,I:Int) => .K ... Sigma => 'update(Sigma,X:Id,I:Int) rule '(S1,S2) => S1 ~> S2 [structural] rule 'if(_)else(Bool(#"true"),S,) => S rule 'if(_)else(Bool(#"false"),Bool),,S) => S rule 'while(_)(B,S) => 'if(_)else(B, '{}('(S, 'while(_)(B,S))), '{}(.KList)) [structural] rule 'int;((',(X:Id, Xs) => Xs), ) Sigma => 'addBinding(Sigma, X:Id, Int(#"0")) when 'notBool('in(X:Id, 'keys(Sigma))) rule 'int;('.Ids(.KList),S) => S [structural] endmodule }}}

##Kuestions

Q1: Do we really need to keep strings in tokens? That is, why not just #token(735, Int) instead of #token("735", "Int")? Of course, the projections will still give us strings. For example, getSort(#token(735,Int)) will evaluate to "Int". So should we replace {{{syntax KToken ::= #token(String,String)}}} with {{{syntax KToken ::= #token(NAME,NAME)}}} ?

A1: Problem solved. I did what Dwight suggested, that is, added KTokenSort, which eliminates the quotes around the sort name only. But that is better actually that what i wanted. Besides, it also gives a nice way to associate builtins to tokens.

Q2: Do we need a more elaborate syntax for declaring configurations and cells, or we can limit ourselves to KItems? How about modularity, where we want to name and reuse configuration fragments?

A2: Partially solved. I only added support for monolithic configurations. We still have to discuss and design the modular configurations.

Q3: Should we keep all the syntax declarations in Kore like in K, so that we can use the priority, prefer, bracket, etc., information for unparsing?

Clone this wiki locally