title | subtitle | author | date | mainfont | monofont | fontsize | toc | documentclass | header-includes |
---|---|---|---|---|---|---|---|---|---|
CS132: Compiler Construction |
Professor Palsberg |
Thilan Tran |
Fall 2020 |
Libertinus Serif |
Iosevka |
14pt |
true |
extarticle |
\definecolor{Light}{HTML}{F4F4F4}
\let\oldtexttt\texttt
\renewcommand{\texttt}[1]{
\colorbox{Light}{\oldtexttt{#1}}
}
\usepackage{fancyhdr}
\pagestyle{fancy}
\usepackage{tikz}
|
\newpage{}
-
a compiler is a program that translates an executable program in one language to an executable program in another language
-
an interpreter is a program that reads an executable program and produces the results of running that program
- usually involves executing the source program in some fashion, ie. portions at a time
-
compiler construction is a microcosm of CS fields:
- AI and algorithms
- theory
- systems
- architecture
-
in addition, the field is not a solved problem:
- changes in architecture lead to changes in compilers
- new concerns, re-engineering, etc.
- compiler changes then prompt new architecture changes, eg. new languages and features
- changes in architecture lead to changes in compilers
-
some compiler motivations:
- correct output
- fast output
- fast translation (proportional to program size)
- support separate compilation
- diagnostics for errors
- works well with debugger
- cross language calls
- optimization
-
for new languages, how are compilers written for them?
- eg. early compilers for Java were written in C
- eg. for early, low level languages like C, bootstrapping is done:
- a little subset of C is written and compiled in machine code
- then a larger subset of C is compiled using that smaller subset, etc.
-
abstract compiler system overview:
- input: source code
- output: machine code or errors
- recognizes illegal programs, and outputs associated errors
-
two-pass compiler overview:
- source code eg. Java compiles through a frontend to an intermediate representation (IR) like Sparrow
- the frontend part of the compiler maps legal code into IR:
- language dependent, but machine independent
- allows for swappable front ends for different source languages
- the frontend part of the compiler maps legal code into IR:
- IR then compiles through a backend to machine code
- the backend part maps IR onto target machine:
- language independent, but machine / architecture dependent
- the backend part maps IR onto target machine:
- source code eg. Java compiles through a frontend to an intermediate representation (IR) like Sparrow
-
frontend overview:
- input: source code
- output: IR
- responsibilities:
- recognize legality syntactically
- produce meaningful error messages
- shape the code for the backend
- the scanner produces a stream of tokens from source code:
- ie. lexing source file into tokens
- the parser produces the IR:
- recognizes context free grammars, while guiding context sensitive analysis
- both steps can be automated to some degree
-
backend overview:
- input: IR
- output: target machine code
- responsibilities:
- translate to machine code
- instruction selection:
- choose specific instructions for each IR operation
- produce compact, fast code
- register allocation:
- decide what to keep in registers at each points
- can move loads and stores
- optimal allocation is difficult
- more difficult to automate
-
specific frontends or backends can be swapped
- eg. use special backend that targets ARM instead of RISC, etc.
-
middleend overview:
- responsibilities:
- optimize code and perform code improvement by analyzing and changing IR
- must preserve values while reducing runtime
- optimizations are usually designed as a set of iterative passes through the compiler
- eg. eliminating redundant stores or dead code, storing common subexpressions, etc.
- eg. GCC has 150 optimizations built in
- responsibilities:
\newpage{}
-
the role of the scanner is to map characters into tokens, the basic unit of syntax:
- while eliminating whitespace, comments, etc.
- the character string value for a token is a lexeme
- eg.
x = x + y;
becomes<id,x> = <id,x> + <id,y> ;
-
a scanner must recognize language syntax
- how to define what the syntax for integers, decimals, etc.
- use regular expressions to specify syntax patterns:
- eg. the syntax pattern for an integer may be
<integer> ::= (+ | -) <digit>*
- eg. the syntax pattern for an integer may be
- regular expressions can then be constructed into a deterministic finite automaton (DFA):
- a series of states and transitions for accepting or rejecting characters
- this step also handles state minimization
- the DFA can be easily converted into code using a while loop and states:
- by using a table that categorizes characters into their language specific identifier types or classes, this code can be language independent
- as long as the underlying DFA is the same
- a linear operation, considers each character once
- by using a table that categorizes characters into their language specific identifier types or classes, this code can be language independent
- this process can be automated using scanner generators:
- emit scanner code that may be direct code, or table driven
- eg.
lex
is a UNIX scanner generator that emits C code
\newpage{}
-
the role of the parser is to recognize whether a stream of tokens forms a program defined by some grammar:
- performs context-free syntax analysis
- usually constructs an IR
- produces meaningful error messages
- generally want to achieve linear time when parsing:
- need to impose some restrictions to achieve this, eg. the LL restriction
-
context-free syntax is defined by a context-free grammar (CFG):
- formally, a 4-tuple
$G = (V_t, V_n, S, P)$ where:-
$V_t$ is the set of terminal symbols, ie. tokens returned by the scanner -
$V_n$ is the set of nonterminal symbols, ie. syntactic variables that denote substrings in the language -
$S$ is a distinguished nonterminal representing the start symbol or goal -
$P$ is a finite set of productions specifying how terminals and nonterminals can be combined- each production has a single nonterminal on the LHS
- the vocabulary of a grammar is
$V = V_t \cup V_n$ - the motivation for using CFGs instead of simple REs for grammars is that REs are not powerful enough:
- REs are used to classify tokens such as identifiers, numbers, keywords
- while grammars are useful for counting brackets, or imparting structure eg. expressions
- factoring out lexical analysis simplifies the CFG dramatically
-
- general CFG notation:
$a, b, c, ... \in V_t$ $A, B, C, ... \in V_n$ $U, V, W, ... \in V$ - $\alpha, \beta, \gamma, ... \in V^$, where $V^$ is a sequence of symbols
- $u, v, w, ... \in V_t^$, where $V_t^$ is a sequence of terminals
-
$A \rightarrow \gamma$ is a production -
$\Rightarrow, \Rightarrow^*, \Rightarrow^+$ represent derivations of 1,$\geq 0$ , and$\geq 1$ steps - if
$S \Rightarrow^* \beta$ then$\beta$ is a sentential form of$G$ - if
$L(G) = {\beta \in V^* | S \Rightarrow^* \beta} \cap V_t^*$ , then$L(G)$ is a sentence of$G$ , ie. a derivation with all nonterminals
- formally, a 4-tuple
-
grammars are often written in Backus-Naur form (BNF):
- non-terminals are represented with angle brackets
- terminals are represented in monospace font or underlined
- productions follow the form
<nont> ::= ...expr...
-
the productions of a CFG can be viewed as rewriting rules:
- by repeatedly rewriting rules by replacing nonterminals (starting from goal symbol), we can derive a sentence of a programming language
- leftmost derivation occurs when the leftmost nonterminal is replaced at each step
- rightmost derivation occurs when the rightmost nonterminal is replaced at each step
- this sequence of rewrites is a derivation or parse
- discovering a derivation (ie. going backwards) is called parsing
- by repeatedly rewriting rules by replacing nonterminals (starting from goal symbol), we can derive a sentence of a programming language
-
can also visualize the derivation process as construction a tree:
- the goal symbol is the root of tree
- the children of a node represents replacing a nonterminal with the RHS of its production
- note that the ordering of the tree dictates how the program would be executed
- can multiple syntaxes lead to different parse trees depending on the CFG used?
- parsing can be done top-down, from the root of the deriviation tree:
- picks a production to try and match input using backtracking
- some grammars are backtrack-free, ie. predictive
- parsing can also be done bottom-up:
- start in a state valid for legal first tokens, ie. start at the leaves and fill in
- as input is consumed, change state to encode popssibilities, ie. recognize valid prefixes
- use a stack to store state and sentential forms
-
try and find a linear parsing algorithm using top-down parsing
-
general top-down parsing approach:
- select a production corresponding to the current node, and construct the appropriate children
- want to select the right production, somehow guided by input string
- when a terminal is added to the fringe that doesn't match the input string, backtrack
- find the next nonterminal to expand
- select a production corresponding to the current node, and construct the appropriate children
-
problems that will make the algorithm run worse than linear:
- too much backtracking
- if the parser makes the wrong choices, expansion doesn't even terminate
- ie. top-down parsers cannot handle left-recursion
-
top-down parsers may backtrack when they select the wrong production:
- do we need arbitrary lookahead to parse CFGs? Generally, yes.
- however, large subclasses of CFGs can be parsed with limited lookahead:
- LL(1): left to right scan, left-most derivation, 1-token lookahead
- LR(1): left to right scan, right-most derivation, 1-token lookahead
-
to achieve LL(1) we roughly want to have the following initial properties:
- no left recursion
- some sort of predictive parsing in order to minimize backtracking with a lookahead of only one symbol
Consider the following simple grammar for mathematical operations:
<goal> ::= <expr>
<expr> ::= <expr> <op> <expr> | num | id
<op> ::= + | - | * | /
-
there are multiple ways to rewrite the same grammar:
- but each of these ways may build different trees, which lead to different executions
- want to avoid possible grammar issues such as precendence, infinite recursion, etc. by rewriting the grammar
- eg. classic precedence issue of parsing
x + y * z
as(x+y) * z
vs.x + (y*z)
-
to address precedence:
- additional machinery is required in the grammar
- introduce extra levels
- eg. introduce new nonterminals that group higher precedence ops like multiplication, and ones that group lower precedence ops like addition
- the higher precedence nonterminal cannot reduce down to the lower precedence nonterminal
- forces the correct tree
Example of fixing precedence in our grammar:
<expr> ::= <expr> + <term> | <expr> - <term> | <term>
<term> ::= <term> * <factor> | <term> / <factor> | <factor>
<factor> ::= num | id
- ambiguity occurs when a grammar has more than one derivation for a single sequential form:
- eg. the classic dangling-else ambiguity
if A then if B then C else D
- to address ambiguity:
- rearrange the grammar to select one of the derivations, eg. matching each
else
with the closest unmatchedthen
- rearrange the grammar to select one of the derivations, eg. matching each
- another possible ambiguity arises from the context-free specification:
- eg. overloading such as
f(17)
, could be a function or a variable subscript - requires context to disambiguate, really an issue of type
- rather than complicate parsing, this should be handled separately
- eg. overloading such as
- eg. the classic dangling-else ambiguity
Example of fixing the dangling-else ambiguity:
<stmt> ::= <matched> | <unmatched>
<matched> ::= if <expr> then <matched> else <matched> | ...
<unmatched> ::= if <expr> then <stmt> | if <expr> then <matched> else <unmatched>
- a grammar is left-recursive if
$\exists A \in V_n s.t. A \Rightarrow^* A\alpha$ for some string$\alpha$ :- top-down parsers fail with left-recursive grammars
- to address left-recursion:
- transform the grammar to become right-recursive by introducing new nonterminals
- eg. in grammar notation, replace the productions
$A \rightarrow A\alpha | \beta | \gamma$ with:$A \rightarrow NA'$ $N \rightarrow \beta | \gamma$ $A' \rightarrow \alpha A' | \varepsilon$
Example of fixing left-recursion (for <expr>, <term>
) in our grammar:
<expr> ::= <term> <expr'>
<expr'> ::= + <term> <expr'> | - <term> <expr'> | E // epsilon
<term> ::= <factor> <term'>
<term'> ::= * <factor> <term'> | / <factor> <term'> | E
- to perform left-factoring on a grammar, we want to do repeated prefix factoring until no two alternaties for a single non-terminal have a common prefix:
- an important property for LL(1) grammars
- eg. in grammar notation, replace the productions
$A \rightarrow \alpha\beta | \alpha\gamma$ with:$A \rightarrow \alpha A'$ $A' \rightarrow \beta | \gamma$
- note that our example grammar after removing left-recursion is now properly left-factored
-
for multiple productions, we would like a distinct way of choosing the correct production to expand:
- for some RHS
$\alpha \in G$ , define$\textit{FIRST}(\alpha)$ as the set of tokens that can appear first in some string derived from$\alpha$ - key property: whenever two productions
$A \rightarrow \alpha$ and$A \rightarrow \beta$ both appear in the grammar, we would like:-
$\textit{FIRST}(\alpha)\cap\textit{FIRST}(\beta) = \emptyset$ , ie. the two token sets are disjoint
-
- this property of left-factoring would allow the parser to make a correct choice with a lookahead of only one symbol
- if the grammar does not have this property, we can hack the grammar
- for some RHS
-
by left factoring and eliminating left-recursion can we transform an arbitrary CFG to a form where it can be predictively parsed with a single token lookahead?
- no, it is undecidabe whether an arbitrary equivalent grammar exists that satisfies the conditions
- eg. the grammar
${a^n0b^n} \cup {a^n1b^{2n}}$ does not have a satisfying form, since would have to look past an arbitrary number of$a$ to discover the terminal
-
idea to translate parsing logic to code:
- for all terminal symbols, call an
eat
function that consumes the next char in the input stream - for all nonterminal symbols, call the corresponding function corresponding to the production of that nonterminal
- perform predictive parsing by looking ahead to the next character and handling it accordingly
- there is only one valid way to handle the character in this step due to the left-factoring property
- how do we handle epsilon?
- just do nothing ie. consume nothing, and let recursion handle the rest
- perform predictive parsing by looking ahead to the next character and handling it accordingly
- creates a mutually recursive set of functions for each production
- the name is the LHS of production, and body corresponds to RHS of production
- for all terminal symbols, call an
Example simple recursive descent parser:
Token token;
void eat(char a) {
if (token == a) token = next_token();
else error();
}
void goal() { token = next_token(); expr(); eat(EOF); }
void expr() { term(); expr_prime(); }
void expr_prime() {
if (token == PLUS) { eat(PLUS); expr(); }
else if (token == MINUS) { eat(MINUS); expr(); }
else { /* noop for epsilon */ }
}
void term() { factor(); term_prime(); }
void term_prime() {
if (token == MULT) { eat(MULT); term(); }
else if (token == DIV) { eat(DIV); term(); }
else { }
}
void factor() {
if (token == NUM) eat(NUM);
else if (token == ID) eat(ID);
else error(); // not epsilon here
}
-
handling epsilon is not as simple as just ignoring it in the descent parser
-
for a string of grammar symbols
$\alpha$ ,$\textit{NULLABLE}(\alpha)$ means$\alpha$ can go to$\varepsilon$ :- ie.
$\textit{NULLABLE}(\alpha) \Longleftrightarrow \alpha \Rightarrow^* \varepsilon$
- ie.
-
to compute
$\textit{NULLABLE}$ :- if a symbol
$a$ is terminal, it cannot be nullable - otherwise if
$a \rightarrow Y_1...Y_n$ is a production:$\textit{NULLABLE}(Y_1) \wedge ... \wedge \textit{NULLABLE}(Y_k) \Rightarrow \textit{NULLABLE}(A)$
- solve the constraints
- if a symbol
-
again, for a string of grammar symbols
$\alpha$ ,$\textit{FIRST}(\alpha)$ is the set of terminal symbols that begin strings derived from$\alpha$ :- ie.
$\textit{FIRST}(\alpha) = {a \in V_t | \alpha \Rightarrow^* a\Beta}$
- ie.
-
to compute
$\textit{FIRST}$ :- if a symbol
$a$ is nonterminal,$\textit{FIRST}(a) = {a}$ - otherwise if
$a \rightarrow Y_1...Y_n$ is a production:$\textit{FIRST}(Y_1) \subseteq \textit{FIRST}(A)$ -
$\forall i \in 2...n$ , if$\textit{NULLABLE}(Y_1...Y_{i-1})$ :$\textit{FIRST}(Y_i) \subseteq \textit{FIRST}(A)$
- solve the constraints, going for the
$\subseteq$ -least solution
- if a symbol
-
for a nonterminal
$B$ ,$\textit{FOLLOW}(B)$ is the set of terminals that can appear immediately to the right of$B$ in some sentential form:- ie.
$\textit{FOLLOW}(B) = { a \in V_t | G \Rightarrow^* \alpha B \beta \wedge a \in \textit{FIRST}(\beta $ )}$
- ie.
-
to compute
$\textit{FOLLOW}$ :-
${$ } \subseteq \textit{FOLLOW}(G)$ where$G$ is the goal - if
$A \rightarrow \alpha B \beta$ is a production:$\textit{FIRST}(\beta) \subseteq \textit{FOLLOW}(B)$ - if
$\textit{NULLABLE}(\beta)$ , then$\textit{FOLLOW}(A) \subseteq \textit{FOLLOW}(B)$
- solve the constraints, going for the
$\subseteq$ -least solution
-
-
a grammar
$G$ is LL(1) iff. for each production$A \rightarrow \alpha_1 | \alpha_2 | ... | \alpha_n$ :-
$\textit{FIRST}(\alpha_1), ..., \textit{FIRST}(\alpha_n)$ are pairwise disjoint - if
$\textit{NULLABLE}(\alpha_i)$ , then for all$j \in 1...n \wedge j \neq i$ :$\textit{FIRST}(\alpha_j) \cap \textit{FOLLOW}(A) = \emptyset$
- if
$G$ is$\varepsilon$ -free, the first condition is sufficient - eg.
$S \rightarrow aS | a$ is not LL(1)- while
$S \rightarrow aS'$ ,$S' \rightarrow aS' | \varepsilon$ accepts the same language and is LL(1)
- while
-
-
provable facts about LL(1) grammars:
- no left-recursive grammar is LL(1)
- no ambiguous grammar is LL(1)
- some languages have no LL(1) grammar
- an
$\varepsilon$ -free grammar where each alternative expansion for$A$ begins with a distinct terminal is a simple LL(1) grammar
-
an LL(1) parse table
$M$ can be constructed from a grammar$G$ as follows:-
$\forall$ productions$A \rightarrow \alpha$ :-
$\forall a \in \textit{FIRST}(\alpha)$ , add$A \rightarrow \alpha$ to$M[A, a]$ - if
$\varepsilon \in \textit{FIRST}(\alpha)$ :-
$\forall b \in \textit{FOLLOW}(A)$ , add$A \rightarrow \alpha$ to$M[A, b]$ (including EOF)
-
-
- set each undefined entry of
$M$ to an error state
- if
$\exists M[A, a]$ with multiple entries, then the grammar is not LL(1)
-
-
recalling definitions:
- for a grammar
$G$ with start symbols$S$ , any string$\alpha$ such that$S \Rightarrow^* \alpha$ is a sentential form - if
$\alpha \in V^*_t$ , then$\alpha$ is a sentence in$L(G)$ - a left-sentential form is one that occurs in the leftmost derivation of some sentence
- a right-sentential form is one that occurs in the rightmost derivation of some sentence
- for a grammar
-
bottom-up parsing, ie. LR parsing, is an alternative to parsing top-down:
- want to construct a parse tree by starting at the leaves and working to the root
- repeatedly, match a right-sentential form from the language against the tree
- ie. for each match, apply some sort of reduction to build on the frontier of the tree
- conversely to LL parsing, LR parsing prefers left recursion
- creates a rightmost derivation, in reverse
- eg. given the grammar
S -> aABe, A -> Abc|b, B -> d
:- parse
abbcde
by replacing terminals with nonterminals repetaedly, ie. applying productions backwards -
abbcde
\rightarrowaAde
\rightarrowaABe
\rightarrowS
- parse
-
must scan input and find some kind of a valid sentential form:
- this valid form is called a handle:
- a handle
$\alpha$ is a substring that matches the production$A \rightarrow \alpha$ where reducing$\alpha$ to$A$ is one step in the reverse of a rightmost derivation - formally, if $S \Rightarrow^*{rm} \alpha Aw \Rightarrow{rm} \alpha\beta w$ then
$A \rightarrow \beta$ can be used as a handle - right sentential, so all symbols in
$w$ are terminals
- a handle
- the process of reducing through handles is called handle pruning
- how can we find which unique handle to apply?
- recognizing handles is out of the scope of this course
- this valid form is called a handle:
-
important theorem involving handles:
- if
$G$ is unambiguous, them every right-sentential form has a unique handle - this is because an unambiguous grammar has a unique rightmost derivation
- if
-
general LR algorithm:
- repeatedly, working backwards, find the handle
$A_i \rightarrow B_i$ in the input - replace ie. prune the handle to generate
$\gamma_{i-1}$
- repeatedly, working backwards, find the handle
-
one way to implement a handle-pruning, bottom-up parser is called a shift-reduce parser:
- uses a stack (handles the shift operation) and an input buffer
- shift-reduce vs. reduce-descent
-
algorithm:
- initialize stack with EOF $$$
- repeat until the top of the stack is the goal symbol and input token is $$$:
- find the handle:
- if we don't have a handle on top of the stack, shift an input symbol onto the stack
- ie. shift until top of stack is the right end of a handle
- prune the handle:
- if we have a handle
$A \rightarrow \beta$ on the stack, reduce by:- popping
$\beta$ symbols off the stack - pushing
$A$ onto the stack
- popping
- if we have a handle
- find the handle:
Simple left-recursive expression grammar:
<goal> ::= <expr>
<expr> ::= <expr> + <term> | <expr> - <term> | <term>
<term> ::= <term> * <factor> | <term> / <factor> | <factor>
<factor> ::= num | id
// note that left-recursion is fine with LR
Running example input id - num * id
through a shift-reduce parser:
// stack
$ // shift next input onto stack
$ id // can reduce, id is a handle
$ <factor>
$ <term>
$ <expr> // out of handles, shift next input
$ <expr> -
$ <expr> - num // can reduce num at top of stack
$ <expr> - <factor>
$ <expr> - <term> // reducing here would disregard the rest of the input stream
$ <expr> - <term> *
$ <expr> - <term> * id
$ <expr> - <term> * <factor> // by looking at input, we should reduce
// <factor> -> <term>*<factor> instead of <factor> -> <term>
$ <expr> - <term>
$ <expr>
$ <goal> // this would be accepted
-
shift-reduce parsers support four actions:
- shift next input symbol onto stack
- reduce by popping handle off stack and pushing production LHS
- accept and end parsing
- error
- the key problem is recognizing handles
-
LR vs. LL grammars and parsing:
- almost all CFG languages can be expressed in LR(1)
- LR parsers detect an error as soon as possible
- naturally, right recursion is used in top-down parsers
- while left recursion is used in bottom-up parsers
- LL dilemma is to pick between different alternate production rules:
-
$A \rightarrow b$ vs.$A \rightarrow c$ - ie. LL(k) parser must recognize the use of of a production after seeing only
$k$ symbols of its RHS
-
- LR dilemma is to pick between different matching productions:
-
$A \rightarrow b$ vs.$B \rightarrow b$ - ie. LR(k) parser must recognize the RHS of a production after having seen all that is derived from RHS with
$k$ symbols of lookhead
-
- in an LL parsing table:
- only need to store nonterminals vs. terminals
- in an LR parsing table:
- need to store all the possible entries at the top of the stack vs. terminals
- top of stack can contain every alternate in a production rule
- usually much greater than number of nonterminals
- more complicated and space consuming, but more powerful as well
- in addition,
$LL(k) \subseteq LR(k)$
- in addition,
- the Java Compiler Compiler (JCC) generates a parser automatically for a given grammar:
- based on LL(k) vs. LL(1)
- transforms an EGBNF grammar into a parser
- can have embedded (additional) action code written in Java
javacc fortran.jj
\rightarrowjavac Main.java
\rightarrowjava Main < prog.f
JavaCC input format:
TOKEN :
{
< INTEGER_LITERAL: ( ["1"-"9"] (["0"-"9"])* | "0" ) >
}
void StatementListReturn() :
{}
{
( Statement() )* "return" Expression() ";"
}
\newpage{}
-
parsers generate a syntax tree from an input file:
- this is an aside on design patterns in order to facilitate using the generated tree
- see Gamma's Design Patterns from 1995
-
for OOP, the visitor pattern enables the definition of a new operation of an object structure without changing the classes of the objects:
- ie. new operation without recompiling
- set of classes must be fixed in advance, and each class must have a hook called the
accept
method
Consider the problem of summing up lists using the following list implementation:
interface List {}
class Nil implements List {}
class Cons implements List {
int head;
List tail;
}
First approach using type casts:
List l;
int sum = 0;
while (true) {
if (l instanceof Nil)
break;
else if (l instanceof Cons) {
sum += ((Cons) l).head;
l = ((Cons) l).tail;
}
}
- pros:
- code is written without touching the classes
- cons:
- code constantly uses type casts and
instanceof
to determine classes
- code constantly uses type casts and
Second approach using dedicated methods (OO version):
interface List { int sum(); }
class Nil implements List {
public int sum() { return 0; }
}
class Cons implements List {
int head;
List tail;
public int sum() { return head + tail.sum(); }
}
-
pros:
- code can be written more systematically, without casts
-
cons:
- for each new operation, need to write new dedicated methods and recompile
-
visitor pattern approach:
- divide the code into an object structure and a visitor (akin to functional programming)
- insert an
accept
method in each class, which takes a Visitor as an argument - a visitor contains a
visit
method for each class (using overloading)- defines both actions and access of subobjects
- pros:
- new methods without recompilation
- no frequent type casts
- cons:
- all classes need a hook in the
accept
method
- all classes need a hook in the
- used by tools such as JJTree, Java Tree Builder, JCC
- summary, visitors:
- make adding new operations easily
- gather related operations
- can accumulate state
- can break encapsulation, since it needs access to internal operations
Third approach with visitor pattern:
interface List {
// door open to let in a visitor into class internals
void accept(Visitor v);
}
interface Visitor {
void visit(Nil x); // code is packaged into a visitor
void visit(Cons x);
}
class Nil implements List {
// `this` is statically defined by the *enclosing* class
public void accept(Visitor v) { v.visit(this); }
}
class Cons implements List {
int head;
List tail;
public void accept(Visitor v) { v.visit(this); }
}
class SumVisitor implements Visitor {
int sum = 0;
public void visit(Nil x) {}
public void visit(Cons x) {
// take an action:
sum += x.head;
// handle subojects:
x.tail.accept(this); // process tail *indirectly* recursively
// The accept call will in turn call visit...
// This pattern is called *double dispatching*.
// Why not just visit(x.tail) ?
// This *fails*, since x.tail is type List.
}
}
Using SumVisitor
:
SumVisitor sv = new SumVisitor();
l.accept(sv);
System.out.println(sv.sum);
- the produced JavaCC grammar can be processed by the JCC to give a parser that produces syntax trees:
- the produced syntax trees can be traversed by a Java program by writing subclasses of the default visitor
- JavaCC grammar feeds into the Java Tree Builder (JTB)
- JTB creates JavaCC grammar with embedded Java code, syntax-tree-node classes, and a default visitor
- the new JavaCC grammar feeds into the JCC, which creates a parser
jtb fortran.jj
\rightarrowjavacc jtb.out.jj
\rightarrowjavac Main.java
\rightarrowjava Main < prog.f
Translating a grammar production with JTB:
// .jj grammar
void assignment() :
{}
{ PrimaryExpression() AssignmentOperator() Expression() }
// jtb.out.jj with embedded java code that builds syntax tree
Assignment Assignment () :
{
PrimaryExpression n0;
AssignmentOperator n1;
Expression n2; {}
}
{
n0 = PrimaryExpression()
n1 = AssignmentOperator()
n2 = Expression()
{ return new Assignment(n0, n1, n2); }
}
JTB creates this syntax-tree-node class representing Assignment
:
public class Assignment implements Node {
PrimaryExpression f0;
AssignmentOperator f1;
Expression f2;
public Assignment(PrimaryExpression n0,
AssignmentOperator n1, Expression n2) {
f0 = n0; f1 = n1; f2 = n2;
}
public void accept(visitor.Visitor v) {
v.visit(this)
}
}
Default DFS visitor:
public class DepthFirstVisitor implements Visitor {
...
// f0 -> PrimaryExpression()
// f1 -> AssignmentExpression()
// f2 => Expression()
public void visit(Assignment n) {
// no action taken on current node,
// then recurse on subobjects
n.f0.accept(this);
n.f1.accept(this);
n.f2.accept(this);
}
}
Example visitor to print LHS of assignments:
public class PrinterVisitor extends DepthFirstVisitor {
public void visit(Assignment n) {
// printing identifer on LHS
System.out.println(n.f0.f0.toString());
// no need to recurse into subobjects since assignments cannot be nested
}
}
\newpage{}
- a program may follow a grammar and parse properly, but other problems may remain
- eg. type errors, use of undeclared variables, cyclical inheritance, etc.
- eg. in Java,
5 + true
gives a type error:- this type rule could be expressed as:
- if two expressions have type
int
, then their addition will also be of typeint
- if two expressions have type
- in typical notation:
$$\frac{a: \oldtexttt{int} \quad b: \oldtexttt{int}}{\oldtexttt{a+b}: \oldtexttt{int}}$$ - in this notation, the conclusion appears under the bar, with multiple hypotheses above the bar
- ie. if hypotheses are true, than conclusion is true
- to check this rule, recursively check if
$e_1$ is typeint
, and then if$e_2$ is also of typeint
- in this notation, the conclusion appears under the bar, with multiple hypotheses above the bar
- when given
5: int
andtrue: boolean
, type checker will see that the types don't obey the rule, and should throw an error
- this type rule could be expressed as:
Implementing a simple type checker:
class TypeChecker extends Visitor {
// f1: Expression (e1)
// f2: Expression (e2)
Type visit(Plus n) { // should either return Plus or throw an error
// recursive calls for subexpressions that *could* individually fail
Type t1 = n.f1.accept(this);
Type t2 = n.f2.accept(this);
// making a decision, ie. actual type checking
if (t1 == "int" && t2 == "int") { // need some mechanic to check type equality
return "int";
} else {
throw new RuntimeException();
}
}
}
- what about for more complex compound type rules, eg. for parsing
3 + (5 + 7)
:- simply by following the recursive calls, the previous type checker would still successfully check the type
- ie. type checking in a DFS manner
- simply by following the recursive calls, the previous type checker would still successfully check the type
-
handling the simple nonterminal
true
:$$\frac{}{\oldtexttt{true}: \oldtexttt{boolean}}$$ -
handling boolean negation:
$$\frac{e: \oldtexttt{boolean}}{\oldtexttt{!e}: \oldtexttt{boolean}}$$ -
handling ternary expressions:
$$\frac{a: \oldtexttt{boolean} \quad b: t \quad c: t}{\oldtexttt{(a ? b : c)}: t}$$ -
$t$ is a type variable since$b$ and$c$ should have the same type
-
-
different types of statements in MiniJava:
-
System.out.println(e)
, assignments,if
andwhile
statements
-
-
unlike expressions, statements don't return anything:
- they may have side effects, but do not have their own types
- type checkers would only either return silently or throw an error
- no overall type value to return
- in typical notation
$\vdash$ means that a sentence type-checks- not necessarily that it is a particular type
-
handling
System.out.println
:$$\frac{\vdash e: \oldtexttt{int}}{\vdash \oldtexttt{System.out.println(e)}}$$ -
handling
if
statements:$$\frac{\vdash e: \oldtexttt{boolean} \quad \vdash a \quad \vdash b}{\vdash \oldtexttt{if (e) a else b}}$$ -
handling
while
statements:$$\frac{\vdash e: \oldtexttt{boolean} \quad \vdash s}{\vdash \oldtexttt{while (e) s}}$$
-
for declared variables, how can we track what specific identifiers represent?
- create and add to a symbol table that caches the declaration of variables with types
- maps identifiers to types
-
type checking rule for an assignment statement, given the symbol table
$A$ :$$\frac{A(x) = t \quad A \vdash e: t}{A \vdash \oldtexttt{x=e}}$$ - by convention,
$A$ should go before each$\vdash$ in all type checking rules- this is because any subexpressions may contain variables
- ie.
$A \vdash e:t$ can be read as expression$e$ given program context from table$A$ can lead to conclusion$t$ when type checking
- by convention,
Symbol table example:
class C {
boolean f;
t m(int a) {
int x,
...
x = a + 5;
...
}
}
// symbol table contains:
// id | type
// ---------
// f | boolean
// a | int
// x | int
-
to type check a variable, just look it up in the symbol table:
$$\frac{A(x) = t}{A \vdash \oldtexttt{x}: t}$$ - rewriting our symbol table lookup using the previous notation
-
to type check
x = a + 5
:$$\frac{A \vdash a: \oldtexttt{int} \quad A \vdash 5: \oldtexttt{int}}{A \vdash \oldtexttt{a+5}: \oldtexttt{int}} \rightarrow \frac{A \vdash x: \oldtexttt{int} \quad A \vdash a+5: \oldtexttt{int}}{A \vdash \oldtexttt{x=a+5}}$$
Example of variable shadowing:
class a {
int a;
int a(int a) {
// int a; // compiler error, can't redeclare parameter
... a + 5 ... // checks closest `a`, ie. read from bottom of symbol table
return 0;
}
}
-
type checking arrays:
- expressions like
arr[idx], new int[len], arr.length
- statements like
arr[idx] = val
- note that it is unreasonable for the type checker to check out of bound indices
- would require the type checker to do some arithmetic in this stage
- expressions like
-
handling array indexing:
$$\frac{A \vdash a: \oldtexttt{int[]} \quad A \vdash b: \oldtexttt{int}}{A \vdash \oldtexttt{a[b]}: \oldtexttt{int}}$$ - in MiniJava, the only array type is
int
, but arrays can generally hold any type$t$ - note that the
int
in the conclusion refers to the array type while the other refers to the index type - note that this rule is an elimination rule since the
int[]
type is consumed into anint
in the conclusion
- in MiniJava, the only array type is
-
handling array assignments:
$$\frac{A \vdash x: \oldtexttt{int[]} \quad A \vdash i: \oldtexttt{int} \quad A \vdash v: \oldtexttt{int}}{A \vdash \oldtexttt{x[i] = v}}$$ -
handling array constructions:
$$\frac{A \vdash e: \oldtexttt{int}}{A \vdash \oldtexttt{new int[e]}: \oldtexttt{int[]}}$$ -
handling the array length property:
$$\frac{A \vdash e: \oldtexttt{int[]}}{A \vdash \oldtexttt{e.length}: \oldtexttt{int}}$$
- what needs to be checked in a method call in Java?
- the method being called needs to exist
- the type of the actual parameter should match the formal parameter
Declaring and calling methods in Java:
u2 m(u a) { // declaration
s
return e2;
}
m(e); // call
-
when type checking the method body, need to ensure:
$$A \vdash s, A \vdash e_2: u_2$$ -
to type check the method call:
$$\frac{A \vdash e: \textit{t} \quad A \vdash m: u \rightarrow u_2 \quad t = u}{A \vdash \oldtexttt{m(e)}: u_2}$$ - note that we need a mechanism to find the types of the method, eg.
m
takes parameter of typeu
and returns typeu2
- note that we need a mechanism to find the types of the method, eg.
-
expressions such as
new C(), this, (C) e
refer to classes:- need a new type
$C$ , representing some class:-
$C$ is another contextual record similar to the symbol table$A$ $A, C \vdash e: t$ - ie. expression
$e$ given program context from table$A$ and class$C$ can lead to conclusion$t$ when type checking
-
-
this
refers to the lexically enclosing class type
- need a new type
-
handling
new C()
:- need to check that
$C$ exists
- need to check that
-
handling casts:
$$\frac{A, C \vdash e: t}{A, C \vdash \oldtexttt{(D)e}: D}$$ - again, may need a runtime check (like array indexing) to actually perform the type cast
- in Java, upcasts will always succeed, while downcasts need a runtime check
-
handling subtyping:
- the notation
$C \leq D$ indicates the classC
is a subclass of the classD
, perhaps transitively or reflexively- eg. for primitives as well,
char <= short <= int <= long <= float <= double
- eg. for primitives as well,
- note that:
$$\frac{}{C \leq C}, \frac{C \leq D \quad D \leq E}{C \leq E}$$ - when
$u \leq t$ :- for polymorphic assignments:
$$\frac{A, C \vdash x: t \quad A, C \vdash e: u}{A, C \vdash \oldtexttt{x=e}}$$ - for method parameters:
$$\frac{A, C \vdash a: D \quad A, C \vdash e: u \quad \oldtexttt{D.m}: t \rightarrow t_2}{A, C \vdash \oldtexttt{a.m(e)}: t_2}$$
- for polymorphic assignments:
- note that to check subclassing relationships, we can either use visitors to traverse the
extends
relationship on the fly, or cache a previous result
- the notation
Polymorphism in Java:
// class B and C both extend class A
A x = (e ? new B() : new C());
// Compiler checks *both* B and C have correct subclassing relationship to A.
// At runtime, the type of x will be related to either B or C.
Extended method example:
class D {
t1 f1
t2 f2
u3 m(u1 a1, u2 a2) {
t1_1 x1
t2_2 x2
s
return e;
}
}
- hypotheses for type checking the above method:
-
a1, a2, x1, x2
are all different -
$A, C \vdash s$ , where$C = D$ :- but what should the symbol table
$A$ hold?f1:t1, f2:t2, a1:u1, a2:u2, x1:t1_1, x2: t2_2
- when using
$A$ , we should search from bottom of table to handle variable shadowing
- but what should the symbol table
-
$A, C \vdash e: u_3'$ , where$u_3' \leq u_3$
-
- conclusion from the above hypotheses:
-
$A, C \vdash \oldtexttt{D.m}$ ie. the methodD.m
type-checks
-
Sample Java program:
class Main {...}
class C1 extends D1 {...}
...
class Cn extends Dn {...}
- type checking responsibilities:
main
must existC1...Cn
need to all be differentD1...Dn
need to all exist- no
extends
cycle in classes
MiniJava generic type syntax:
class C <X*> extends N {
S* f*; // class fields
k // class constructor method
// zero or more methods of the following pattern:
<Y*> U m(U* x*) {
...
return e;
}
...
}
// additional new supported expressions:
e.<V*>m(e*)
new N (e*)
// Where N ::= C<T*>
// and S, T ::= N | X
// and U, V ::= N | Y
// and X, Y are *type parameters*.
// The asterisk indicates a vector of possibly multiple values.
Example with generic types:
class List<A> extends Object {
// ^declaration of A
<R> R accept(ListVisitor<R, A> f) {
// ^decl ^use ^use ^use
// The R,A in ListVisitor<R,A> are used as type vars
// in an instantiation of *another* class ListVisitor.
return this.accept(f);
}
}
class ListNull<B> extends List<B> {
// ^decl ^instantiates List with B as type var A
<W> W accept(ListVisitor<W, B> f) {
// ^decl ^use ^use ^use
return this.accept(f);
}
}
-
statements to type check in the above example:
-
return this.accept(f)
trivially type checks due to recursion -
extends List<B>
establishes the inheritance relationship toList
- while linking together
B
with the type variableA
in the declaration ofList
- while linking together
-
-
to typecheck the entire generic class declaration:
$$\frac{\vdash \oldtexttt{M*} \textit{in } \oldtexttt{C<X*>}}{\vdash \oldtexttt{class C<X*> extends N { S* f*; k M* }}}$$ -
to typecheck a generic method declaration: $$\frac{\vdash x^: U^ \quad \vdash \oldtexttt{this}: \oldtexttt{C<X*>} \quad \vdash e: S, S \leq U \quad \textit{override}(m, N, \oldtexttt{<Y*>U*} \rightarrow U)}{\vdash \oldtexttt{<Y*> U m(U* x*) { return e; }} \textit{in } \oldtexttt{C<X*>}}$$
-
to check method overriding: $$\frac{\textit{mtype}(m, N) = \oldtexttt{<Z*>U*} \rightarrow U \implies T^* \leq U^[Y^/Z^] \wedge T \leq U[Y^/Z^]}{\textit{override}(m, N, \oldtexttt{<Y>T*} \rightarrow T)}$$
- where
$U[Y/Z]$ indicates the type$U$ with all instances of type$Y$ replaced with$Z$ - this operation is used to instantiate a type variable declaration with a different name, ie. replacing a declaration with a usage
- where
-
to implement
mtype
: $$\frac{\oldtexttt{class C <X*> ... { ... M* }} \quad m \in M^}{\textit{mtype}(m, \oldtexttt{C<T>}) = (\oldtexttt{<Y*>U*} \rightarrow U)[X^/T^]}$$ -
and if the method is not found: $$\frac{\oldtexttt{class C <X*> extends N { ... M* }} \quad m \notin M^* }{\textit{mtype}(m, \oldtexttt{C<T*>}) = \textit{mtype}(m, N[X^/T^])}$$
-
to type check a method call expression: $$\frac{\vdash e: T \quad \textit{mtype}(m, T) = \oldtexttt{<Y*>U*} \rightarrow U \quad \vdash e^: S^ \quad S^* \leq U^[V^/Y^]}{\vdash \oldtexttt{e.m(e)}: [V^/Y^]U}$$
\newpage{}
- Sparrow is the intermediate language used in CS132
- characteristics:
- no classes
- methods in classes have concatenated names, eg.
Fac.ComputeFac
becomesFacComputeFac
- methods in classes have concatenated names, eg.
- uses
goto
to implementif else
- uses brackets to indicate heap loads or stores
- no global variables, only heap and functions have global visibility
- functions may have extra parameters added
- no classes
-
a program
p
is a series of functions,p ::= F1...Fm
-
a function declaration
F
has syntaxF ::= func f(id1...idf) b
-
a block
b
is a series of instructions,b ::= i1...in return id
-
an instruction can be:
- a label
l:
- an assignment
id = c
, wherec
is an integer literal:- or
id = @f
, wheref
is a function
- or
- an operation
id = id + id
,id = id - id
,id = id * id
- a less-than test
id = id < id
- a heap load or store:
id = [id + c]
,[id + c] = id
, wherec
is the offset and heap addresses are valid
- an allocation
id = alloc(id)
:- eg.
v0 = alloc(12)
allocates 12 bytes - creates a 3-tuple of addresses to values accessible by
[v0 + 0]
,[v0 + 4]
,[v0 + 8]
- eg.
- a print
print(id)
- an error print
error(s)
- an unconditional goto
goto l
, wherel
is a label - a conditional goto
if0 id goto l
, jumps ifid
contains 0 - a function call
id = call id(id1...idf)
, whereid
contains a function name
- a label
-
identifiers can be any reasonable identifiers, except:
a2...a7
,s1...s11
,t0...t5
which are all RISC register names
\newpage{}
- overall translation pipeline is:
- MiniJava:
- mostly unbounded, eg. in the number of variables, parameters, methods, classes, etc.
- Sparrow:
- still unbounded
- may even generate new variables for simplicity / ease of translation
- Sparrow-V:
- number of registers becomes bounded
- want to minimize variables allocated on the stack
- RISC-V:
- register count still bounded
- MiniJava:
-
program state consists of the tuple
$(p, H, b^*, E, b)$ :-
$p$ is the program -
$H$ is the heap that maps from heap addresses to tuples of values- the tuple can be indexed into
-
$b^*$ is the body of the function that is executing right now- can only perform a
goto
within this function block
- can only perform a
-
$E$ is the environment that maps from identifiers to values -
$b$ is the remaining part of the block that is executing right now- ie.
$b^*$ contains the entire function block, while$b$ only contains the current and remaining statements in the block
- ie.
-
-
in a state transition, we want to step from one state to the next: $$(p, H, b^, E, b) \rightarrow (p, H', b^{'}, E', b')$$
-
assignment state transition: $$(p, H, b^, E, \oldtexttt{id=c} \enskip b) \rightarrow (p, H, b^, E \cdot [id \mapsto c], b)$$
-
arithmetic state transition: $$(p, H, b^, E, \oldtexttt{id=id1+id2} \enskip b) \rightarrow (p, H, b^, E \cdot [id \mapsto (c_1+c_2)], b)$$
- where
$E(id_1) = c_1$ and$E(id_2) = c_2$ - ie. this transition requires a runtime check in the environment
- where
-
heap load state transition: $$(p, H, b^, E, \oldtexttt{id=[id1+c]} \enskip b) \rightarrow (p, H, b^, E \cdot [id \mapsto (H(a_1))(c_1+c)], b)$$
- where
$E(id_1) = (a_1, c_1)$ such that$a_1$ is a heap address and$c_1$ its offset, and$(c_1 + c) \in domain(H(a_1))$ - ie. the new computed offset is a valid index into the tuple on the heap
- where
-
heap allocation state transition: $$(p, H, b^, E, \oldtexttt{id=alloc(id1)} \enskip b) \rightarrow (p, H \cdot [a \mapsto t], b^, E, b)$$
- where
$E(id_1) = c$ and$c$ is divisible by 4,$a$ is a fresh address, and$t = [0 \mapsto 0, 4 \mapsto 0, ... (c-4) \mapsto 0]$
- where
-
unconditional
goto
state transition: $$(p, H, b^, E, \oldtexttt{goto l} \enskip b) \rightarrow (p, H, b^, E, b')$$- where
$find(b^*, l) = b'$ -
$find(b, l)$ is used to find the label$l$ inside the block$b$
- where
-
conditional
goto
state transition: $$(p, H, b^, E, \oldtexttt{if0 id goto l} \enskip b) \rightarrow (p, H, b^, E, b')$$- where
$E(id) = 0$ and$find(B^*, l) = b'$
- where
-
function call state transition: $$(p, H, b^, E, \oldtexttt{id = call id0(id1...idf)} \enskip b) \rightarrow (p, H', b^, E \cdot [id \mapsto E'[id']], b)$$
- where
$E(id_0) = f$ ,$p$ contains$\oldtexttt{func f (id1'...idf') b'}$ , and$E' = [id_1' \mapsto E(id_1), id_2' \mapsto E(id_2),...]$ - the function
$f$ is then called through the following state transition:$(p, H, b', E', b') \rightarrow (p, H', b', E', \oldtexttt{return} \enskip id')$
- note the intermediate transfer of control to the callee
- where
-
want to translate
$e, k \rightarrow c, k'$ where$e$ is some expression in MiniJava and$c$ is the output code in Sparrow, while managing:- additional variables (that could have been added during translation)
- labels used in jumps in IR
-
$k$ is a "fresh" or new integer number that has not yet been used- can be utilized to generate variable and label names
- after translation, the number
$k'$ should be the next new number - by convention, the result of the expression
$e$ is stored in the variable$t_k$
-
simple expression:
$$\oldtexttt{5}, k \rightarrow \oldtexttt{tk = 5}, k+1$$ -
expression with a local variable:
$$\oldtexttt{x}, k \rightarrow \oldtexttt{tk = x}, k+1$$ -
recursive translations:
$$\oldtexttt{e1+e2}, k \rightarrow c_1 c_2 \oldtexttt{tk=tl+tk1}, k_2$$ - given that
$e_1, k+1 \rightarrow c_1, k_1$ and$e_2, k_1 \rightarrow c_2, k_2$ , and where$t_l = t_{k+1}$ - note that by convention,
$c_1$ will be stored in$t_{k+1}$ and$c_2$ will be stored in$t_{k1}$
- given that
-
example addition translation:
$$\oldtexttt{7+9}, 3 \rightarrow \oldtexttt{t4=7 t5=9 t3=t4+t5}, 7$$ - after
$\oldtexttt{7}, 4 \rightarrow \oldtexttt{t4=7}, 5$ and$\oldtexttt{9}, 5 \rightarrow \oldtexttt{t5=9}, 6$
- after
-
example nested addition translation:
$$\oldtexttt{(7+9)+11}, 3 \rightarrow \oldtexttt{t5=7 t6=9 t4=t5+t6 t7=11 t3=t4+t7}, 8$$ -
$\oldtexttt{7}, 5 \rightarrow \oldtexttt{t5=7}, 6$ and$\oldtexttt{9}, 6 \rightarrow \oldtexttt{t6=9}, 7$ $\oldtexttt{7+9}, 4 \rightarrow \oldtexttt{t5=7 t6=9 t4=t5+t6}, 7$ $\oldtexttt{11}, 7 \rightarrow \oldtexttt{t7=11}, 8$ - variables
t4...t6
handle7+9
,t7
handles11
, andt3
holds the entire expression
-
-
want to translate
$s, k \rightarrow c, k'$ where$s$ is some statement in MiniJava and$c$ is the output code in Sparrow-
$k'$ will differ from$k$ when a subexpression is contained within the statement
-
-
simple recursive statements:
$$s_1s_2, k \rightarrow c_1c_2, k_2$$ - where
$s_1, k \rightarrow c_1, k_1$ and$s_2, k_1 \rightarrow c_2, k_2$ - no return value since statements only have a side effect, so only ordering of output code matters
- where
-
simple assignment statement:
$$\oldtexttt{x=e}, k \rightarrow c \enskip \oldtexttt{x=tk}, k'$$ - where
$e, k \rightarrow c, k'$ and the result of$c$ is held in variable$t_k$ by convention
- where
-
if else
statement:$$\oldtexttt{if(e) s1 else s2}, k \rightarrow c_k, k_2$$ $e, k+1 \rightarrow c_e, k_e$ $s_1, k_e \rightarrow c_1, k_1$ $s_2, k_1 \rightarrow c_2, k_2$ - to generate control code, need to generate labels and use jumps:
- can use
$k$ in order to avoid label duplication - also need to linearize the code while allowing
$s_1$ to execute without$s_2$ , and vice versa - uses an unconditional jump and a conditional jump
- can use
if else
statement:
c_e // result stored in temporary t_{k+1}
if0 t_{k+1} goto else_k
c1
goto end_k
else_k:
c2
end_k:
-
while
statement:$$\oldtexttt{while(e) s}, k \rightarrow c_k, k_s$$ $e, k+1 \rightarrow c_e, k_e$ $s, k_e \rightarrow c_s, k_s$ - again, need to generate control code
- note that the compiler does not care about how many times the loop runs
- thus the label subscript
$k$ simply depends on static code structure
- thus the label subscript
- conventially,
for
loops are usually reduced intowhile
loops
while
statement:
loop_k:
c_e // result stored in temporary t_{k+1}
if0 t_{k+1} goto end_k
c_s
goto loop_k
end_k:
To translate
class Result {
String code;
int k1;
}
Result visit(Seq n, int k) {
Result res1 = n.f1.accept(this, k);
Result res2 = n.f2.accept(this, res1.k);
return new Result(res1.code + res2.code, res2.k);
}
Translating the following code starting from
while (true) {
if (false)
x = 5;
else
y = 7;
}
Translation process:
// handling `while (e) s`:
true, 1 -> t1 = 1, 2 // e
// handling s, which has the form `if (e1) s1 else s2`:
false, 3 -> t3 = 0, 4 // e1
5, 4 -> t4 = 5, 5 // s1
x=5, 4 -> x = t4, 5
7, 5 -> t5 = 7, 6 // s2
y=7, 5 -> x = t5, 6
if (false) x=5 else y=7, 2 -> { // s
t3 = 0
if0 t3 goto else2
t4 = 5
x = t4
goto end2
else2:
t5 = 7
y = t5
end2:
}, 6
while (true) {if (false) x=5 else y=7}, 0 -> {
loop0:
t1 = 1
if0 t1 goto end0
// code generated from s, the if-else statement
goto loop0
end0:
}, 6
-
need to generate Sparrow IR for the following MiniJava code involving arrays:
- expressions like
arr[idx], new int[len], arr.length
- statements like
arr[idx] = val
- expressions like
-
in Sparrow, arrays will be represented on the heap:
- can create an array by allocating space on the heap
- need to track and store the array length somewhere:
- by convention, simply store the length of the array at position 0 on the heap, and shift array positions in the heap accordingly
- thus in Sparrow, first array element is stored at offset 4, and last element is at offset
$4n$ where$n$ is array length - total amount to allocate is
$4 \times (n+1)$ to store length
-
array length:
$$\oldtexttt{e.length}, k \rightarrow c \enskip \oldtexttt{tk=[tl+0]}, k'$$ - after evaluating
$e, k+1 \rightarrow c, k'$ and where$t_l = t_{k+1}$
- after evaluating
-
array allocation:
$$\oldtexttt{new int[e]}, k \rightarrow c \enskip \oldtexttt{tk'=4*(tl+1) tk=alloc(tk') [tk+0]=tl}, k'+1$$ - after evaluating
$e, k+1 \rightarrow c, k'$ and where$t_l = t_{k+1}$ -
$t_k'$ is a new temporary to store$4 * (e+1)$ while saving the original array length to store back in array - in the Sparrow spec,
alloc
will initialize all heap entries to zero
- after evaluating
-
array indexing:
$$\oldtexttt{e1[e2]}, k \rightarrow c_1c_2 \enskip \oldtexttt{tk2=4*(tk1+1) tl=tkl+tk2 tk=[tl+0]}, k_2+1$$ - after evaluating
$e_1, k+1 \rightarrow c_1, k_1$ and$e_2, k_1 \rightarrow c_2, k_2$ , and where$t_l = t_{k+1}$ - note that the second part of a heap lookup must be a constant so we build up the entire dynamic offset in the first parameter
- to check bounds, need to ensure that
$0 \leq e_2 < n$ where$n$ is the length of the array in$e_1$ - similar process for array asignments in the form
a[e1] = e2
- after evaluating
Implementing a bounds check:
t_{k2+1} = -1 < tk1
t_{k2+2} = [t_{k+1} + 0]
t_{k2+3} = tk1 < t_{k2+2}
t_{k2+4} = t_{k2+1} * t_{k2+3} // multiplication acts as logical AND
// including a bounds check uses up to k2+5 for temporaries
if0 t_{k2+4} goto error
... // loading or storing code here
goto end
error:
error("out of bounds")
end:
MiniJava method example:
class C {
Y m(T a) {
S;
return e;
}
}
In Sparrow, there are no more classes:
func C.m(this a) { // name mangling
c
return x
}
- how to implement objects in Sparrow?
- like arrays, represent objects on the heap
- object fields will be given by offsets into the heap
- eg. to access the field
$f, k \rightarrow \oldtexttt{tk=[this+L]}, k+1$ where$L$ is the static offset remembered by the compiler associated with the field$f$
- eg. to access the field
- but how can we handle
this
ie. the current object?- need to translate the method call
e1.m(e2)
intoC.m(t1 t2)
- where
t1, t2
hold the expression results ofe1, e2
respectively - note that we can identity the full mangled name for
e1.m
by finding the typeC
ofe1
- need to translate the method call
- when we allow for inheritance and
extends
, we no longer know exactly which methods we are calling- similarly, object fields can also be inherited and accessed in subclasses
MiniJava inheritance example:
class B {
t p() { this.m(); }
u m() { ... }
}
class C extends B {
u m() { ... } // overrides B.m
}
In Sparrow:
B.p(this) {
B.m(...) // this is now *wrong*, could be called with C instead of B
}
B.m(this) { ... }
C.m(this) { ... }
-
to handle inheritance, need to add an additional level of indirection:
- add a method table for all objects:
- by convention, method table stored at position 0, similarly to where length is stored in arrays
- shift locations of object fields by 4
- method table contains entries pertaining to all inherited visible methods
- each entry holds the address of a function
- for the above example:
- a
B
object's method table should contains references toB.p, B.m
- while a
C
object's method table should contains references toB.p, C.m
- note that the offsets across related objects should line up
- a
- method tables are shared across objects of the same class, so each class can have a single method table allocated and all objects will point to that single copy
- to call a class function in OOP:
- load method table
- load function name
- additional indirection to handle inheritance
- call
- add a method table for all objects:
-
consider building method tables for the following inheritance relationship:
- class
A
has methodsm, n
- method table holds refs. to
A.m, A.n
- method table holds refs. to
- class
B extends A
has no methods- method table holds refs. to
A.m, A.n
- method table holds refs. to
- class
C extends B
with methodsp, m
- method table holds refs. to
C.m, A.n, C.p
, regardless of method definition order
- method table holds refs. to
- class
F extends C
with methodsq
- method table holds refs. to
C.m, A.n, C.p, F.q
- method table holds refs. to
- thus compiler needs to associate the method
n
with offset 4 into method tables, etc.
- class
-
generally, calling a method
e.m()
is implemented using a load, load, call on the method table:- in some special scenarios involving a unique target, we can replace this implementation with a single direct call
- ie. statically replacing a general call sequence with a much faster, specialized one
-
what are the characteristics that make a call have a unique target?
- what are the classes of the results of
e
ine.m()
?- eg. classes
A, B, C
- eg. classes
- which methods
m
can be called?- eg. two methods,
A.m
andC.m
- eg. two methods,
- is the
m
that is called always the same?- if not, eg.
A.m
is different fromC.m
, cannot go to specialized sequence
- if not, eg.
- but
e
may have nested callsites within it- thus answering question (1) requires cyclically answering (2)
- what are the classes of the results of
-
old approach used at runtime in Objective-C:
- instead of having individual method tables for each object, have one giant method table at runtime representing the methods for all possible object types and their method names
- at runtime, check this large method table to find which method to call
- requires a lookup into the table, as well as storing a representation of the type in each object
-
a static approach is to perform Class Hierarchy Analysis (CHA):
- revolves around knowing class hierarchies
- consider the simple hierarchy
A -> B -> C
, with methodsA.m, C.m
:- the declaration
C x
and later callx.m()
has multiple possibilities, according to the hierarchy - the code
x = new C()
whould lead to callingC.m
, whilex = new B()
would callA.m
- the declaration
- CHA approach:
- use type declarations
- use the class hierarchy
-
another static approach called Rapid Type Analysis (RTA):
- again uses class hirearchies
- consider the previous example from CHA:
- RTA would notice that there is no
x = new C()
in the program - thus RTA will realize that
x.m()
has the unique target ofA.m
- RTA would notice that there is no
- requires just another linear scan to check for
new
statements - RTA approach:
- use type declarations
- use the class hierarchy
- use
new
statements
-
another static approach called Type-Safe Method Inlining (TSMI):
- takes an entire method and inlines it in the call whenever it is a unique target
- not as expensive or good as CFA
Illustrating issue with TSMI:
interface I { void m(); }
class C implements I {
C f;
void m() {
this.f = this;
}
}
I x = new C();
x.m();
// becomes translated to ==>
x.f = x; // an error since x has type I
// TSMI needs to detect that I will only have type C
-
another static approach is called level 0 Control Flow Analysis (0CFA)
-
goes from syntax, to constraints, to information
- want to generate variables
$[e]$ for eache
that range over sets of possible class namese
can take
- want to generate variables
-
CFA approach:
- starting from syntax, generate constraints by performing a linear pass over the code
- solve constraints to get the desired sets of class names
- runs in
$O(n^3)$
-
syntax of interest and the constraints they imply:
-
new C()
:- implies the constraint
$[\oldtexttt{new C()}] = {C}$
- implies the constraint
-
x = e
:- implies the constraint
$[e] \subseteq [x]$
- implies the constraint
-
e1.m(e2)
andclass C { _ m (_ a) { return e }
:- imply the constraint:
- if
$C \subseteq [e1]$ , then$[e2] \subseteq [a] \wedge [e] \subseteq [e1.m(e2)]$
- if
- imply the constraint:
-
Code example:
A x = new A();
B y = new B();
x.m(new Q());
y.m(new S());
class A {
void m(Q arg) {
arg.p();
}
}
class B extends A {
void m(Q arg) {...}
}
class Q { ... }
class S extends Q { ... }
-
CHA:
-
x.m
no unique target -
y.m
unique target, nothing overridesB.m
-
arg.p
no unique target
-
-
RTA:
-
x.m
still no unique target, bothnew A()
andnew B()
occur -
y.m
unique target, only onenew
-
arg.p
no unique target
-
-
CFA:
-
A x = new A()
implies$[x] \subseteq [\oldtexttt{new A()}] \wedge [\oldtexttt{new A()}] = {A}$ -
x.m(new Q())
andclass A { void m(Q arg) { return arg.p(); }}
- implies that if
$A \in [x]$ then$[\oldtexttt{new Q()}] \subseteq [arg]$ - and
$[\oldtexttt{arg.p()}] \subseteq [\oldtexttt{x.m(new Q())}]$
- implies that if
- etc.
-
x.m
unique target -
y.m
unique target -
arg.p
unique target
-
-
1CFA:
- spends exponential time when generating constraints
- but is able to catch inconsistencies in 0CFA that are more likely to detect unique targets
- essentially, uses separate copies of constrained methods instead of just one
- want to compile lambda expressions while moving from recursion (more expensive stack) to iteration
- translate lambda expressions to an intermediate form in three steps / approaches:
- tail form
- functions never return
- using continuations
- first-order form
- functions are all top level
- using data structures
- imperative form
- functions take no arguments
- using register allocation
- tail form
Illustrating recursion vs. iteration:
// recursion:
static Function<Integer, Integer> // in class Test
// ^param(s) ^return type
fact = n ->
n == 0 ? 1 : n * Test.fact.apply(n-1); // builds up a stack
// iteration:
static int factIter(int n) {
int a = 1;
while (n != 0) {
a = n * a;
n = n - 1;
}
return a;
}
// low-level iteration with goto:
factIter: a = 1;
Loop: if (n == 0) { }
else { a = n*a; n = n-1; goto Loop }
- in continuation passing style (CPS):
- the continuation is the code representing the returning of the computation
- instead of returning from a function, just call the continuation
General program to tail form:
// use CPS:
static BiFunction<Integer, Function<Integer, Intenger>, Integer>
factCPS = (n, k) ->
n == 0 ?
k.apply(1)
: Test.factCPS.apply(n-1, v -> k.apply(n * v)); // builds up continuation
factCPS.apply(4, v -> v); // how to actually call CPS function
// evaluation of a tail form has *one call* as the last operation
- evaluation of a tail form expression has one call as the last operation:
Tail ::= Simple | Simple.apply(Simple*) | Simple ? Tail : Tail
- while evaluation of a simple expression has no calls
Simple ::= Id | Constant | Simple PrimitiveOp Simple | Id -> Tail
CPS transformation rules:
static Function<...> foo = x -> ...
// ==>
static BiFunction<...'> fooCPS = (x, k) -> k.apply(...)
k.apply(... (foo.apply(a, n-1)) ...) // can only have one call in CPS!
// ==>
fooCPS.apply(a, n-1, v -> k.apply(... v ...)) // extract out foo.apply call
k.apply(foo.apply(a, n-1)) // special case
// ==>
fooCPS.apply(a, n-1, k);
k.apply(y ? ... : ...)
// ==>
y ? k.apply(...) : k.apply(...)
k.apply(foo.apply(a) ? ... : ...)
// ==>
fooCPS(x, v -> k.apply(v ? ... : ...))
// ==>
fooCPS(x, v -> v ? k.apply(...) : k.apply(...))
Translating original fact
into its tail form, factCPS
:
fact = n -> n == 0 ? 1 : n * Test.fact.apply(n-1);
// ==>
factCPS = (n, k) -> k.apply(
n == 0 ? 1 : n * Test.fact.apply(n-1));
// ==>
factCPS = (n, k) ->
n == 0 ? k.apply(1) : k.apply(n * Test.fact.apply(n-1));
// ==>
factCPS = (n, k) ->
n == 0 ? k.apply(1) : Test.factCPS.apply(n-1, v -> k.apply(n * v))
- in CPS form, there are really two continuations at play,
k
andv -> k.apply(...)
:Cont ::= v -> v | v -> Cont.apply(n * v)
- if we can represent continuations without lambda functions, would only have functions at the top level
Implementing continuations as purely datatypes:
interface Continuation {
Integer apply(Integer a);
}
class Identity implements Continuation {
public Integer apply(Integer a) { return a; }
}
class FactRec implements Continuation {
Integer n; Continuation k;
public FactRec(Integer n, Continuation k) {
this.n = n; this.k = k;
}
public Integer apply(Integer v) { return k.apply(n * v); }
}
CPS to first-order form:
static BiFunction<Integer, Continuation, Integer>
facCPSadt = (n, k) -> // adt - abstract data type
n == 0 ? k.apply(1) :
Test.factCPSadt.apply(n-1, new FactRec(n, k));
- from first-order, we can prove that it is possible to purely represent a continuation as a number:
- every continuation (at least for
factCPS
) is of the formv -> p * v
- eg.
k.apply(1)
is justp * 1 = p
- eg.
v -> k.apply(n * v)
is justv -> (p * n) * v
or just(p * n)
- this simplification can only work for some primitive operations, but in general, all first-order forms can be expressed as iteration
- every continuation (at least for
First-order form to imperative form:
static BiFunction<Integer, Integer, Integer>
factCPSnum = (n, k) -> n == 0 ? k :
Test.factCPSnum.apply(n-1, k * n);
// get rid of function parameters and use global registers:
static Integer n; static Integer k;
static void factCPSimp() {
if (n == 0) { }
else { k = k*n; n = n-1; factCPSimp(); }
}
// to low-level iteration with gotos:
static Integer n; static Integer k;
factCPSimp:
if (n == 0) { }
else { k = k*n; n = n-1; goto factCPSimp }
\newpage{}
-
in Sparrow, all variables are on the stack, which forces relatively slow access:
- in the Sparrow-V IR, some variables are held in registers for faster access
- want to fit as many variables as possible into the registers in a process called register allocation
-
register allocation can be broken down into two subproblems:
- liveness analysis
- graph coloring (such that no adjacent nodes are the same color)
- interface betwen the two steps is an interference graph
- an approach using heuristics
-
incremental steps for saving stack variables:
- no registers used
- use registers for storing all subcomputations
- use registers to pass function parameters
- have to save all registers before function calls in case they are clobbered
- avoid saving registers that are not utilized after function calls
- using liveness analysis to reuse registers
Sparrow to Sparrow-V example:
// Sparrow
a = 1 // prog. pt. 1 (point lies just before instruction)
b = 2 // prog. pt. 2
c = a + 3 // prog. pt. 3
print b + c // prog. pt. 4
// Sparrow-V with two registers (s0, s1)
s0 = 1
s1 = 2 // cannot assign to s0, would clobber `a`
c = s0 + 3 // leave `c` on the stack
// but `a` is no longer used, can reuse a register
print s1 + c
// Sparrow-V attempt 2
s0 = 1
s1 = 2
s0 = s0 + 3 // load before store
print s1 + s0
-
program points lie between lines of instructions:
- ie. where labels can be introduced
- first program point lies before the first instruction
-
in liveness analysis, want to examine:
-
when is a variable live, or what is its live range?
-
a
is live in the range$[1, 3]$ , ie. from 1 up to 3 -
b
is live in$[2, 4]$ -
c
is live in$[3, 4]$
-
- note that the live ranges for
a, b
andb, c
overlap, while the live ranges fora, c
do not- ie. overlapping live ranges conflict
-
when is a variable live, or what is its live range?
-
in an inteference graph:
- nodes represents variables
- edges represent conflicts
- allocating registers is similar to coloring this graph
What is the fewest number of registers to entirely allocate the following code?
a = 1 // p.1
b = 10 // p.2
c = 8 + b // p.3
d = a + c // p.4
e = a + d // p.5
f = c + e // p.6
d = a + f // p.7
c = d + f // p.8
f = a + c // p.9
return c + f // p.10
Live ranges, from inspection:
a : [1, 9]
b : [2, 3]
c : [3, 6] *and* [8, 10] (c can be reused before it is declared again)
d : [4, 5] and [7, 8]
e : [5, 6]
f : [6, 8] and [9, 10]
Corresponding inteference graph:
\begin{center} \begin{tikzpicture} \node[shape=circle,draw=black,fill=red,label=left:$s_0$] (A) at (0,0) {a}; \node[shape=circle,draw=black,fill=yellow,label=left:$s_1$] (B) at (0,3) {b}; \node[shape=circle,draw=black,fill=orange,label=left:$s_3$] (C) at (2.5,4) {c}; \node[shape=circle,draw=black,fill=green,label=$s_2$] (E) at (5,0) {e}; \node[shape=circle,draw=black,fill=yellow,label=left:$s_1$] (F) at (2.5,-1) {f}; \node[shape=circle,draw=black,fill=green,label=$s_2$] (D) at (5,3) {d}; \path[-] (A) edge (B); \path[-] (A) edge (C); \path[-] (A) edge (D); \path[-] (A) edge (E); \path[-] (A) edge (F); \path[-] (C) edge (D); \path[-] (C) edge (E); \path[-] (C) edge (F); \path[-] (D) edge (F); \end{tikzpicture} \end{center}
Sparrow-V output code:
s0 = 1
s1 = 10
s3 = 8 + s1
s2 = s0 + s3
s1 = s0 + s2
s1 = s3 + s1
s2 = s0 + s1
s3 = s2 + s1
s1 = s0 + s3
return s3 + s1
-
given some statement
$n$ :-
$def[n]$ includes the variables defined or assigned in$n$ -
$use[n]$ includes the variables used in$n$ -
$in[n]$ includes the variables live coming in to$n$ -
$out[n]$ includes the variables live coming out of$n$
-
-
in a control-flow graph:
- nodes are statements
- directed edges are possible control flow directions
- a node's successor(s) are the nodes reachable from it
-
defining liveness equations:
Example code:
a = 0 // 1
L1: b = a + 1 // 2
c = c + b // 3
a = b * 2 // 4
if a < 10 goto L1 // 5
return c // 6
// in control-flow graph:
// 1->2, 2->3, 3->4, 4->5, 5->6, 5->2
Building liveness equations for example code:
1 | a | a | a, c | c | a, c | |||
2 | b | a | a | b, c | a, c | b, c | a, c | b, c |
3 | c | b, c | b, c | b | b, c | b | b, c | b, c |
4 | a | b | b | a | b | a, c | b, c | a, c |
5 | a | a | a, c | a, c | a, c | a, c | a, c | |
6 | c | c | c | c |
- algorithm steps:
- initially,
$in$ and$out$ are both$\emptyset$ - use liveness equations as update steps, iteratively
- in the table,
$in_k$ represents$in$ at iteration$k$
- in the table,
- keep iterating until no change
- give the final
$in$ and$out$ values, we can determine intefering variables:- every variable in the same box interferes with each other, pairwise
- given
$n$ statements and$O(n)$ variables, there are$O(n^2)$ iterations:- ie. at each stage, at least one element will be added to
$out$ - at each iteration,
$n$ set-unions are performed each with a runtime of$O(n)$
- ie. at each stage, at least one element will be added to
- thus, the algortihm has
$O(n^4)$ runtime- can be reduced to
$O(n^2)$ with more detailed analysis
- can be reduced to
- initially,
-
graph coloring ie. liveness allocation is the problem of allocating colors / registers given live ranges
-
liveness analysis in linear time:
- instead of the full
$O(n^4)$ liveness analysis algorithm, is there a linear time approximation? - instead of dividing up live ranges, simply take the entire interval from a variable's first declaration to its last use:
- intervals can be retrieved in linear time
- ignores gaps in live ranges, and interpolates them
- seems to be a reasonable approximation that still indicates intefering variables
- instead of the full
-
linear scan liveness allocation:
- now that we have a linear liveness analysis algorithm, want to achieve liveness allocation in linear time as well:
- will not be as thorough of an allocation as running the full
$O(n^4)$ liveness analysis and a complete graph coloring in exponential time - but should be a good estimate, using heuristics
- will not be as thorough of an allocation as running the full
- now that we have a linear liveness analysis algorithm, want to achieve liveness allocation in linear time as well:
-
linear scan algorithm:
- sort live range intervals by starting point
- perform a greedy coloring in a left-to-right scan:
- tentatively assign different color ranges to colors ie. registers while the ranges overlap
- once a previous interval expires before the start time of the current interval, finalize its coloring
- when we have run out of registers, we have to spill a variable onto the stack:
- ie. store it on the stack instead of in registers
- as a heuristic, spill the one that extends the furthest into the future (based on end time)
- note that we can only take over ie. spill tentatively assigned registers
- thus, this algorithm has
$O(n)$ time- there are only a constant number of registers to track tentative assignments for
Liveness analysis and allocation example using linear algorithms:
a = 1 // 1
b = 10 // 2
c = 9 + a // 3
d = a + c // 4
e = c + d // 5
f = b + 8 // 6
c = f + e // 7
f = e + c // 8
b = c + 5 // 9
return f // 10
Live range intervals (using linear approximation):
a : [1, 4]
b : [2, 9]
c : [3, 9]
d : [4, 5]
e : [5, 8]
f : [6, 10]
// note here that the intervals in this problem are already in sorted order
- given three registers
r1, r2, r3
to color with:- assign
r1
toa
- assign
r2
tob
- assign
r3
toc
a
has expired, finalizer1
fora
- assign
r1
tod
- assign
d
immediately expires, finalizer1
ford
as well- assign
r1
toe
- assign
- no more registers, have to spill:
- spill
f
onto the stack
- spill
- coloring is complete
- assign
Allocation results using three registers:
a : r1
b : r2
c : r3
d : r1
e : r1
f : <mem>
// r1 can be reused multiple times without inteferences
- given two registers
r1, r2
to color with:- assign
r1
toa
- assign
r2
tob
- no more registers, have to spill:
- spill
b
onto the stack (could also spillc
) - assign
r2
toc
- spill
a
has expired, finalizer1
fora
- assign
r1
tod
- assign
d
immediately expires, finalizer1
ford
as well- assign
r1
toe
- assign
- no more registers, have to spill:
- spill
f
onto the stack
- spill
- coloring is complete
- assign
Allocation results using two registers:
a : r1
b : <mem>
c : r2
d : r1
e : r1
f : <mem>
// r1 can be reused multiple times without inteferences
-
in a full inteference graph, generated from running the full liveness analysis algorithm:
-
$k \geq s$ , where$k$ is the minimum number of colors and$s$ is the size of the max clique -
however, if all the live ranges are presented as intervals,
$k = s$ - in general, graph coloring is NP-complete
- by using intervals, graph coloring becomes a polynomial time operation
-
-
liveness analysis transforms the register allocation problem into a graph coloring problem:
- do we lose anything in this transformation?
- ie. is the register allocation problem itself also an NP-complete problem?
- can transform in the other direction, from a coloring to an allocation problem
- eg. transform a certain graph into an equivalent code segment to run register allocation over, and run it through liveness allocation in order to color it
- do we lose anything in this transformation?
Graph coloring example:
\begin{center} \begin{tikzpicture} \node[shape=circle,draw=black] (A) at (0,0) {a}; \node[shape=circle,draw=black] (B) at (3,0) {b}; \node[shape=circle,draw=black] (C) at (6,0) {c}; \path[-] (A) edge (B); \path[-] (B) edge (C); \end{tikzpicture} \end{center}
From graph coloring to a representative program:
a = 1
b = 2
c = 3
// consider the graph as an inteference graph
print(a + b) // a and b overlap, ie. both are live
print(b + c) // b and c overlap
// meanwhile, a and c are never live at the same time
- because this problem is transformable, register allocation is indeed an NP-complete problem:
- thus, if the compiler should perform anything smart in its execution, it should do register allocation
- register allocation by hand is unfeasible considering the very good linear approximation the compiler can quickly perform
Sample code illustrating when copy propagation can be used:
b = a // copying statement from a to b (without changes), can we get rid of it?
c = b
d = b
// ==>
b = a // this is a useless statement
c = a
d = a
// ==>
c = a
d = a
-
when could this type of copy propagation be performed?
- formally, given the control flow graph of the code:
- the top node (
b = a
) dominates the other lower node (c = b
) - ie. every path from above one node that makes it to the other, must go through the first node
- ie. we will always "hit" it
- the top node (
- there are no updates to
a
orb
betweenb = a
andc = b
- then we can update
b = a
andc = b
as justc = a
- when translating Java to Sparrow, we generated a lot of extra copy statements for ease of development
- this is handled by copy propagation
- formally, given the control flow graph of the code:
-
comparing types of data-flow analysis:
- in liveness analysis, we keep track of defines and uses in the sets
$in$ and$out$ :- the variables range over sets of program variables
- given constraints of equalities over sets, we work backwards
- in copy propagation, we keep track of "gens" and "kills" again in the sets
$in$ and$out$ :- the variables range over sets of copy statements
- given constraints of equalities over sets, we work forwards, ie. push copy statements forward
- in liveness analysis, we keep track of defines and uses in the sets
-
defining copy propagation equations:
$$in[n] = \bigcap_{p \in pred(n)} out(p)$$ $$out[n] = gen(n) \enskip \bigcup \enskip (in(n) - kill(n))$$ -
for the statement
$n$ in the program, where:-
$pred(n)$ contains the predecessors of the statement in the control flow graph -
$gen(n) = {n}$ if$n$ is a copy statement, and$\emptyset$ otherwise -
$kill(n)$ contains the statementb = a
anywhere in the program, or whenever$n$ assigns to eitherb
ora
- very similar equations to the liveness analysis equations, but working backwards in the other direction
- in addition, liveness analysis uses a large union ie. cares about all live variables
- while copy propagation uses a large intersection ie. picking which statements to propagate
-
-
algorithm:
- initialize all set variables to
$\emptyset$ - repeatedly update
$in$ , until no change
- with
$O(n^4)$ complexity, or$O(n^2)$ complexity with more detailed analysis
- initialize all set variables to
-
another similar optimization called constant propagation can also performed
\newpage{}
-
we have not yet addressed calling conventions, ie. implementing the procedure abstraction in the compiler
-
need to ensure that every procedure:
- inherits a valid run-time environment
- restores an environment for its parent
-
the procedure abstraction involves some caller
$p$ and callee$q$ :- on entry of
$p$ , establish an environment - around a call of
$q$ , preserve the environment of$p$ - on exit of
$p$ , tear down its environment - in between, handle addressability and proper lifetimes
- on entry of
-
an activation record is a stack frame:
- each procedure call will have an associated activation record at run time
- two registers point to relevant activation records:
- the stack pointer pointing to the end of the current stack
- the frame pointer pointing to the end of the parent's stack
- information stored on the stack:
-
incoming arguments from the caller
- by convention, saved right before the frame pointer
- return address
- local variables
- temporaries
- saved registers
- outgoing arguments
- return value?
-
incoming arguments from the caller
- because functions may make different method calls dynamically, the number of outgoing arguments varies from call to call
- thus frame sizes will vary as well
- how can we pass return values backwards up the stack to the caller?
- in addition to storing outgoing arguments just before its stack pointer, the caller will also allocate space for callee to place the return value in
- ie. in the callee's frame pointer
- thus this shared area between the two stack frames will hold arguments and return values
- better to address values near the frame pointer, whose offsets are known, rather than the stack pointer
- in addition to storing outgoing arguments just before its stack pointer, the caller will also allocate space for callee to place the return value in
-
the procedure linkage convention divides responsibility between the caller and callee
- caller pre-call
$\rightarrow$ callee prologue$\rightarrow$ callee epilogue$\rightarrow$ caller post-call
- caller pre-call
-
caller pre-call:
- allocate basic frame
- evaluate and store params
- store return address (may be a callee responsibility on different systems)
- jump to child
-
callee prologue:
- save registers, state
- store
FP
ie. perform the dynamic link - set new
FP
- store static link
- extend basic frame for local data and initializations
- fall through to code
- steps 1-5 may be done in varying orders
- the static link is a feature used in languages that allow for nested functions:
- the dynamic link points back up to the calling procedure's frame
- following these links creates a dynamic chain of nested calls
- while the static link will point up to the statically nested parent's frame:
- specifically, the frame corresponding to the nearest parent call (since parent may have been called multiple times)
- following these links creates a static chain of statically nested methods
- ie. static links allow accessing of nonlocal values
- the dynamic link points back up to the calling procedure's frame
-
callee epilogue:
- store return value (in shared frame area)
- restore registers, state
- cut back to basic frame
- restore parent's
FP
- jump to return address
-
caller post-call:
- copy return value
- deallocate basic frame
- restore parameters (if copy out)
-
possibilities for dividing up the work of saving and restoring registers:
- difficult: callee saves caller's registers
- call must include a bitmap of caller's used registers
- easy: caller saves and restores its own registers
- easy: callee saves and restores its own registers
- difficult: caller saves callee's registers
- caller must use a bitmap held in callee's stack frame, or some method table
- easy: callee saves and restores all registers
- easy: caller saves and restores all registers
- in practice, approaches (2) and (3) are used
- difficult: callee saves caller's registers
-
RISC-V registers:
x0
orzero
holds hard-wired zerox1
orra
holds the return address (caller-saved)x2
orsp
holds the stack pointer (callee-saved)x3
orgp
holds the global pointerx4
ortp
holds the thread pointerx5-7
ort0-2
hold temporaries (caller-saved)x8
ors0/fp
holds the frame pointer (callee-saved)x9
ors1
is a callee-saved registerx10-11
ora0-1
hold function arguments and return values (caller-saved)x12-17
ora2-7
hold function arguments (caller-saved)x18-27
ors2-11
are callee-saved registersx28-31
ort3-6
are caller-saved registers
-
RISC-V linkage:
- pre-call:
- pass arguments in registers
a0-7
or the stack - save caller-saved registers
- execute a
jalr
that jumps to target address and saves the return address inra
- pass arguments in registers
- post-call:
- move return value from
a0
- remove space for arguments from stack
- move return value from
- prologue:
- allocate space for local variables and saved registers
- save registers eg.
ra
and callee-saved registers
- epilogue:
- restore saved registers
- copy return value into
a0
- clean up stack and return
- pre-call:
\newpage{}
-
interpreters are not compilers:
- an interpreter executes a program "on-the-fly", without an explicit total translation to another language:
- will still take in code parsed into a data structure, as a compiler does
- ie. interpreter works with a single language, while a compiler works with at least two
- typically incurs a performance overhead of ~10x
- a Java program may be converted to Java bytecode using
javac
:- the bytecode is then often run using an interpreter
- JavaVM has the choice of using an interpreter or a compiler
- compiling the bytecode is an investment into the future, and if the time lost isn't regained, better to just run the interpreter
- Java bytecode is more stable than Java source code, and was originally intended to be used for easy distribution
- could we write an interpreter for source-level Java code?
- eg. like interpreter in CS132 used for Sparrow, Sparrow-V, and RISC-V
- all outputs / result states should be comparable
- an interpreter executes a program "on-the-fly", without an explicit total translation to another language:
-
given the simple grammar
e ::= c | e + e
:- our interpreter should take a single expression
e
as an input and give a numeric output - going from program syntax to a different type like a Java integer, rather than to another language's syntax
- our interpreter should take a single expression
Implementing a simple interpreter:
class Interpreter implements Visitor {
int visit(Nat n) { return n.f0; }
int visit(Plus n) {
return (n.f0).accept(this) + (n.f1).accept(this);
}
}
Compared to a simple compiler for the same grammar:
class State { String p, int k }
class Compiler implements Visitor {
String visit(Nat n, State s) {
s.p += String.format("v%s = %s", s.k, n.f0);
return String.format("v%s", s.k++);
}
String visit(Plus n, State s) {
String v0 = n.f0.accept(this, s);
String v1 = n.f1.accept(this, s);
s.p += String.format("%s = %s + %s", s.k, v0, v1);
return String.format("v%s", s.k++);
}
}
- same grammer with boolean extension:
e ::= true | !e
as well as previous rules- with more types, the interpreter has to work with many different types at once:
- contrasted with compilers which only handle strings
- thus, this is the fundamental reason why interpreters are so much slower:
- types have to be unpacked and repacked together
- ie. untagged and tagged
- in addition, with loops and recursions, compilers compile to some code just once
- while interpreters have to recursively re-visit and re-execute loops multiple times
Another interpreter for more types:
class Value {}
class Nat Extends Value {
int i;
Nat(int i) { this.i = i; }
<A> A accept(ValueVisitor v) {
return v.visit(this);
}
}
class Boolean extends Value { ... }
class Interpreter implements Visitor {
Value visit(Plus n) {
// n.f0 -> check isNat -> unpack and getNat ->
// n.f1 -> check isNat -> unpack and getNat ->
// Pack both ints using addition (+) into a Value, and return it,
// ie. unpack, and then repack into a value.
}
...
}
- interpreter must maintain state of the Sparrow program:
- program
- heap
- current block
- environment
- remaining executing block ie. program counter
- must distinguish between local and global state:
- program and heap are global
- rest are local (can be represented in some kind of stack)
- stack of maps!
Part of the Sparrow interpreter:
public class Interpreter extends Visitor() {
Program prog;
List<Value[]> heap = new ArrayList<Value[]>(); // Value tuples
Stack<LocalState> state = new State<LocalState>(); // block, environment, and pc
public void visit(Subtract n) {
Value v1 = access(n.arg1); // unpacking to Value
Value v2 = access(n.arg2);
if ((v1 instanceof IntegerConstant) && (v2 instanceof IntegerConstant)) {
int rhs = ((IntegerConstant) v1).i - ((IntegerConstant) v2).i;
Value v = new IntegerConstant(rhs); // repack
update(n.lhs, v);
state.peek().pc = state.peek().pc + 1;
}
}
public void visit(Store n) {
Value vl = access(n.base);
Value vr = access(n.rhs);
if (vl instanceof HeapAddressWithOffset) {
HeapAddressWithOffset hawo = (HeapAddressWithOffset) vl;
heap.get(hawo.heapAddress)[(hawo.offset + n.offset)/4] = vr;
state.peek().pc = state.peek().pc + 1;
}
}
public void visit(Call n) {
Value v = access(n.callee);
if (v instanceof FunctionName) {
FunctionName ce = (FunctionName) v;
GetFunctionDecl gfd = new GetFunctionDecl(ce);
prog.accept(gfd);
FunctionDecl fd = gfd.result;
List<Identifier> formalParameters = fd.formalParameters;
List<Value> actualParameters = new ArrayList<Value>();
for (Identifier s : n.args) {
actualParameters.add(access(s));
}
// create and push new local environment
Map<String, Value> m = new HashMap<String, Value>();
for (int i = 0; i < formalParameters.size(); i++) {
m.put(formalParameters.get(i).toString(), actualParameters.get(i));
}
state.push(new LocalState(fd.block, m, 0)); // func's block, new env, pc = 0
stepUntilReturn();
Value result = access(fd.block.return_id);
state.pop();
update(n.lhsm, result);
state.peek().pc = state.peek().pc + 1;
}
}
}
Interpreter helper functions:
Value access(Identifier id) {
return state.peek().env.get(id.toString());
}
void update(Identifier id, Value v) {
state.peek().env.put(id.toString(), v);
}
void update(Register id, Value v) {
registerFile.put(r.toString(), v);
}
\newpage{}
-
constant propagation:
- similar to copy propagations, use constants instead of additional variables
-
loop unrolling:
- loop is always run the same number of times, so we can get rid of the code used to maintain loop state and just duplicate body code
-
loop invariant code motion:
- move a part of the loop outside the loop since it performs the same thing each iteration
- eg. some kind of loop initialization
-
common subexpression elimination:
- motivation of working with arrays
- caching an array element where it would be accessed more expensively multiple times without changing its value
- done similarly to copy propagations
-
polyhedral optimization:
- motivation of working with loops and nested loops
- reorders or splits loops to allow for optimal parallelism
- loops themselves can also be run in any order
- tries many different loop orderings
\newpage{}
Euclid's algorithm:
public static int gcd(int x, int y) {
return y == 0 ? x : gcd(y, (x % y));
} // already first-order form
Even-Odd deciders:
static Function <Integer, Boolean>
even = n -> n == 0 ? true : Test.odd.apply(n-1);
static Function <Integer, Boolean>
odd = n -> n == 0 ? false : Test.even.apply(n-1);
// already first-order form
Fibonacci to tail form:
static Function<Integer, Integer>
fib = n -> n <= 2 ? 1 : Test.fib.apply(n-1) + Test.fib.apply(n-2);
static BiFunction<Integer, Function<Integer, Integer>, Integer>
fibCPS = (n, k) -> n <= 2
? k.apply(1)
: Test.fibCPS.apply(n-1,
v1 -> Test.fibCPS.apply(n-2,
v2 -> k.apply(v1 + v2)));
Tail form Fibonacci to first-order:
class FibRec1 implements Continuation {
Integer n; Continuation k;
public FibRec1(Integer n, Continuation k) {
this.n = n; this.k = k;
}
public Integer apply(Integer v) {
return Test.fibCPSadt.apply(n-2, new FibRec2(v, k));
}
}
class FibRec2 implements Continuation {
Integer v1; Continuation k;
public FibRec1(Integer v1, Continuation k) {
this.v1 = v1; this.k = k;
}
public Integer apply(Integer v) {
return k.apply(v1 + v);
}
}
static BiFunction<Integer, Continuation, Integer>
fibCPSadt = (n, k) -> n <= 2 ? k.apply(1) :
Test.fibCPSadt.apply(n-1, new FibRec1(n, k));
-
given the following grammar:
$A ::= \varepsilon | zCw$ $B ::= Ayx$ $C ::= ywz | \varepsilon | BAx$ - then:
$\textit{FIRST}(A) = {z}$ $\textit{FIRST}(B) = {y, z}$ $\textit{FIRST}(C) = {y, z}$ $\textit{NULLABLE}(A) = \textit{true}$ $\textit{NULLABLE}(B) = \textit{false}$ $\textit{NULLABLE}(C) = \textit{true}$
- we can make the following observations for each nonterminal on the RHS:
$w \in \textit{FOLLOW}(C)$ $y \in \textit{FOLLOW}(A)$ $\textit{FIRST}(A) \subseteq \textit{FOLLOW}(B)$ $x \in \textit{FOLLOW}(B)$ $x \in \textit{FOLLOW}(A)$
- thus:
$\textit{FOLLOW}(A) = {x, y}$ $\textit{FOLLOW}(B) = {x, z}$ $\textit{FOLLOW}(C) = {w}$
- therefore the grammar is not LL(1), since for
$C$ :$\textit{FIRST}(ywz) \cap \textit{FIRST}(BAx) \neq \emptyset$
-
What is the fewest number of registers to entirely allocate the following code?
a = 1 // 1
b = 10 // 2
c = a + a // 3
d = a + c // 4
e = c + d // 5
f = b + 8 // 6
c = f + e // 7
f = e + c // 8
b = c + 5 // 9
return b + f // 10
Live ranges, from inspection:
a : [1, 4]
b : [2, 6], [9, 10]
c : [2, 6], [7, 9]
d : [4, 5]
e : [5, 8]
f : [6, 7], [8, 10]
Corresponding inteference graph:
\begin{center} \begin{tikzpicture} \node[shape=circle,draw=black,fill=red,label=left:$s_2$] (A) at (0,0) {a}; \node[shape=circle,draw=black,fill=yellow,label=$s_0$] (B) at (0,3) {b}; \node[shape=circle,draw=black,fill=orange,label=$s_1$] (C) at (2.5,4) {c}; \node[shape=circle,draw=black,fill=green,label=$s_3$] (E) at (5,0) {e}; \node[shape=circle,draw=black,fill=red,label=left:$s_2$] (F) at (2.5,-1) {f}; \node[shape=circle,draw=black,fill=red,label=$s_2$] (D) at (5,3) {d}; \path[-] (A) edge (B); \path[-] (A) edge (C); \path[-] (B) edge (A); \path[-] (B) edge (C); \path[-] (B) edge (D); \path[-] (B) edge (E); \path[-] (B) edge (F); \path[-] (C) edge (E); \path[-] (C) edge (F); \path[-] (E) edge (F); \end{tikzpicture} \end{center}