-
Notifications
You must be signed in to change notification settings - Fork 5
/
2018-1 Compiler Design.fex
941 lines (870 loc) · 34.3 KB
/
2018-1 Compiler Design.fex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
introduction
======
definitions:
programming language:
a precisely defined language
compiler:
program written in a host language
translates a programming language to another one
program:
sequence of expressions executed by a stateful processor
expression:
reads / modifies state & determines next expression
special E_stop halts the program
execution:
elaboration of expressions on some machine M
M defined by software (virtual)
M realized by hardware (physical)
translated (further compiled down) or interpreted (directly executed)
formal specifications:
operational semantics:
abstract machine A interprets sequences of steps
effects produced by A determines meaning
denotational semantics:
mathematical construct defining outcome
steps to reach desired outcome can be chosen
axiomatic semantics:
rules that describe effects of operations (assertions on programs state)
compiler:
reasons for translation:
L1 faster to develop, no hardware exists
L2 more efficient/stable, actual hardware exists
tasks:
preservation of semantics:
same results for all legal inputs
illegal inputs, non functional requirements may differ
resource management:
limited registers & storage, finite representation
respect execution environment (caching mechanisms)
constraint checking:
ensure program can indeed be translated
inject code to preserve constraints at runtime
models:
ahead of execution time (AOT):
compilation prior to execution
commonly used for languages without environments
just in time (JIT):
continuous compilation at runtime (optimize frequently used methods)
commonly used for languages with VM (java, c#)
architecture:
frontend:
read input, transform, check constrains
intermediate representation (IR):
compiler internal format (tree form or more complex structures)
expresses all language constructs/concepts
may consits of multiple stages & optimizer
backend:
generate code respecting machine constrains
design:
correctness:
design for correctness, tradeoff with performance
if (correct program & valid inputs) behaves same as L1
else behaves in a defined way (by L1 / L2 / machine)
compiler built in course
===
specifics for the JavaLi compiler build in course
ANTRL:
ambiguity resolution by taking document order
direct left-recursion (in same rule) allowed
IR (tree based):
tree determines evaluation order
code generator:
child code must be generated first
registers:
list of possible registers given by architecture
minimize usage by evaluating first left or first right
handle too few registers (abort, restructure, spilling)
template:
defines behaviour of IR node or collection of nodes
contains code & describes bookkeeping tasks
leaves result in register & reflects machine properties
process:
traverse tree, retrieving most specific template possible
emits code (by writing to buffer / file)
does bookkeeping as specified by template
const:
put value in register
if too big then require frontend to throw error
else deal with it in assembly (add to .data, assemble at runtime)
var:
if left delivers an address for parent
if right delivers a value
operators:
map each operator to an assembly instruction
put result of operation in one, allow reuse of other
assign:
move value from right to location at left
improve code generator:
special instructions with const childs (incr for +1, shift for *4)
directly use memory operand instead of movl in register first
frontend
====
grammar:
provide set of rules to generate strings
elements:
terminals, characters (a,...,z, empty)
non-terminals, syntactic variables (A,...,Z)
start symbol (S)
set of productions (Z->aZb)
L(G):
production => as left-hand side (LHS) -> right-hand side (RHS)
derivation =>* as repeated applied productions
L(G) is set of strings w, for all w there exists S =>* w
grammar type overview:
context-free (type-2) if only single non-terminal on LHS
linear if context-free and at most one NT in RHS
left-linear if linear and NT at left end of RHS
right-linear if linear and NT at right end of RHS
regular (type-3) if left- or right-linear
type-2 grammar:
efficient analysis techniques known
non-terminal evaluation order does not matter
can be recognised by stack machine
type-3 grammar:
generate regular languages, regex
recognised by finite deterministic automata
derivations:
start at S, then find next rule to get to word w ("top-down")
left-most (always pick left-most non-terminal)
right-most (always pick right-most non-terminal)
ambiguous grammar:
if more than one left-most derivation can be found
avoid ambiguity by changing language/grammar or defining precedence
lexical analysis (scanner):
transform character stream into tokens for parser
token assembly:
stop if character not in token anymore (like whitespace)
maximal much strategy:
put into current token as much as possible
but error reporting difficult, breaks down with complex languages
couple with parser:
ask parser for expected token types (top-down vs bottom-up)
syntactical analysis (parser):
prove that word is in grammar by providing steps of derivations
can choose way to parse if multiple non-terminals
employ regular expressions:
define regex for each grammar token ("top-down" approach)
because exact steps to generate symbols do not matter
can be recognised by both DFA and NFA (finite state machines)
construct parse tree:
as the result of the derivation, removing unnecessary information
root is S, nodes from productions, leaves from terminals
produce IR:
convert the parsed syntax tree to IR
removes intermediate steps & compresses info
like E(->Id) - OP(->+) -> E(->Id) transformed to +(->Var, ->Var)
top-down parser:
start with S, work towards retrieving w
simple stack machine:
decision based on current input & top of stack
error/accept as output actions
reduction to change symbol on stack
match to consume from input & pop from stack
bruteforce possible as words & grammars finite
strategies:
backtracking:
deterministic bruteforce variant
start with S on stack
if (top of stack & input match) then consume input
else if (can apply production) then apply production
else if (can revert & try new production) then do it
else if (can restore input & try new production) then do it
else reject
predictive:
if grammar in LL(k), the next k symbols determine action
try to construct parsing table M with (x=input, y=non-terminals)
define next production for all valid top-of-stack & k next input combinations
entries which stay empty throw a parsing error, x=$ and y=$ accepts
if such M can be constructed then grammar not ambiguous
parsing table M for LL(1):
LL-regular grammar LL(1) (left-to-right scan, left-most derivation)
compute FIRST(p):
from left to right evaluate; let current be first element of RHS
(1) if current is terminal or empty, then add to FIRST(p) and stop
(2) else add FIRST(current) - e to FIRST(p)
(2.1) if e element_of FIRST(current) then continue
(2.1.1) if no more following elements then add epsilon to FIRST(p)
(2.1.2) else continue at (1) with the following element
repeat for all RHS variants, start with RHS fulfilling (1)
compute FOLLOW(p):
find all occurrences of LHS in others RHS
let p' be the rule with the occurrence, let match be LHS in p'
(1) if S include $ in FOLLOW(p)
(2) if match at the end add FOLLOW(p') to FOLLOW
(3) else add FIRST(next) - e to FOLLOW
(3.1) if e element_of FIRST(next) then continue at (2) with the following element
repeat for all LHS
construct M:
write terminals to x axis, non-terminals to y axis
for each production A -> B with B != e write it to the FIRST(B) cells
for each B = e, write A -> e to all FOLLOW(A) cells
add extra row/column with $, writing ACCEPT in the crossing
process stack:
start with stack = $ S, pointer to first entry, tree = S
(1) lookup table with M[entry, S], replace with S on stack & write in tree
(2) if top of stack = entry then pop, advance pointer, goto (2)
(3) else goto (1)
bottom-up parser:
start at w, tries to get back to S
simple stack machine:
decision based on current input & top of stack
error/accept as output actions
push to shift input on stack
reduce to replace top of stack with production
strategies:
predictive:
stack as scratchpad; with symbols to be used later
delay recording production until its obvious
need to know derivations of RHS to know if its production applies
add a control-symbol between each symbol to preserve context
use parsing table M[control symbol, t] to get next action
parsing table M for SLR(1):
LR-regular grammar LR(1) (left-to-right scan, right-most derivation)
create LR(0) sets with FSA automaton
build simplified LR(1) parser (SLR(1)) parser
construct FSA:
dot . = processing position, augment grammar with S' -> .S
(1) create first state of FSA as S' -> .S
(2) compute closure by following NT after dot (S -> .A, A -> .aA | .b)
(3) create all possible transitions (a produces A -> a.A), goto (2)
give all n possible states a name I_k
create table:
compute first & follow, number states & productions
states to x axis
actions (terminals+$) & gotos (non-terminals-S') to y axis
for terminal transitions, write Shift::#state to corresponding cell
for non-terminal transitions, write #state to corresponding cell
for states with S' -> S. write accept to $ column
for states with A -> b., write Reduce::#production to FOLLOW(A) column
process stack:
start with stack = <0> & set pointer to first entry
(1) tablelookup with state (right-most control symbol) & entry
(2) if shift then write entry + <shift_state> then goto (1)
(3) if reduce then record reduction X->Y & remove all from stack
(3 cont.) tablelookup with state & X, then write X + <goto_state>
parser varia:
generators can create LR(0,1) items, LL(1..) parsers
canonical LR (CLR):
uses LR(1) items (lookahead of one)
may resolves ambiguities, but FSA needs more states
lookahead LR (LARL):
same number of states as SLR(1) but equally powerful than CRL
limitations of context-free grammars:
want to find errors with parser
L1 = { a c a | a element_of {a, b}*} for c separate body/definition
L1 could find undefined variables, but not context-free
L2 = {a^n b^n} for a definition and b calling site
L2 could check if method invocations are proper, but not context-free
use semantic analysis for that
semantic analysis
===
check properties (types & constraints)
perform transformations (casts, default values, initializers, ...)
attribute grammar:
context-free grammar extended with context-sensitive info
attributes:
attached to non-terminals
hold value
synthesised (from children) or inherited (from parent/siblings)
notation:
name non-terminals of production, possibly with index (<P>, <P_0>)
can add properties functioning like global variables (<A>.N)
can specify calculations (<A_0>.N = <A_1>.N + 1)
can specify conditions (iff <A_0>.N == <A_1>.N)
symbol table:
repository of program symbols
nesting may hides symbols (variable hides field)
enables checks for classes, fields, methods, variables
structure:
global symbol table:
contains class entries
contains constants such as true/false, void, ...
class entry:
fields (size, location (offset from this))
methods (location)
base class
method entry:
parameters (order, type)
variables (size, location (offset from frame pointer))
debugging:
definition location
construction:
walk over AST, create entries
field/method references:
implicit "this" reference if not specified
explicitly specified class reference
perform checks:
no duplicates (class / methods)
no undefined fields/method/classes (resolve reference)
valid inheritance (no cycles, default extend Object)
valid type (parameter passing, variable assigning)
valid type conversions (add int + float = float)
valid array access (0 <= i < A.length; might needs intermediates)
defined program start (main method)
no unused variables / fields
functions return value on all execution paths
protection system respected (private/protected access)
interface/abstract correctly implemented
delayed checks:
checks which can't be performed by simple analyzer
model execution checks, runtime checks, undecidable
delayed checks:
model execution checks:
variables initialized before usage
exceptions are caught
return statement is reachable
end of program is reachable
methods are called at least once
time/power policies respected
undecidable checks:
loops/program terminates
runtime checks:
array out of bounds access
implicit null checks by pointing to low & protected memory region
triggers protection error (SIGSEGV); VM catches & throws exception
software engineering
=====
patterns:
strategy:
encapsulate set of related algorithms into class hierarchy
interface (IStrategy) passed to context (IConsumer)
ConcreteStrategy implements IStrategy
specialization:
create new child class with more specific functionality
child inherits parent class, overriding or adding methods/fields
visitor:
performs a double invocation (caller/callee determined at runtime)
couples data structure w/ unrelated functionality (IElements)
with access methods w/ unrelated operations (IVisitor)
IVisitor with visitMyElement implemented by MyVisitor
IElement with accept(IVisitor) implemented by MyElement
MyElement implements accept() { visitor.accept(this) }
MyVisitor has method for each IElement implementation
the visitor or the element traverses (possibly tree) structure
observer:
presentation (IObserver) is notified about change in data (ISubject)
ISubject implements attach(Observer), detach(Observer), notify()
IObserver implements update()
subject calls update() upon change; then observer gets state of subject
collection extensions:
provide hook for externally defined functions for easy extensibility
high order function f (named such because passed as an argument)
MyCollection provides hook(IFunction f) which applies f to all elements
MyFunction implements IFunction with map(IElement)
software quality:
general approach:
identify quality dimensions
set priorities, discuss tradeoffs
establish objectives (time, cost, resources, performance)
perform measurements to fulfil objectives
quality assurance:
follow guidelines of software construction
pursue appropriate testing strategy
informal reviews, formal reviews, external audits
internal characteristics:
primarily interests developer
flexibility (can be modified for unintended environments)
reusability (can be used in another execution environment)
testability (test if requirements are fulfilled)
maintainability (address changing needs easily)
readability (consistency, conventions, structure, used constructs)
understandability (separation of concerns)
external characteristics:
primarily interests user
correctness of specification, design, implementation
robustness to handle invalid input / environment
usability (fit for purpose)
adaptability (incorporates with other systems)
integrity (access control)
traceability (access traces / logs provided)
accuracy (results as expected)
reliability (performs under assumed environment)
informal reviews:
execute tests & understand source code to catch errors early
explicitly not to assign blame or evaluate staff/project
walk-through:
devs present system
code reading:
feedback via comments, done by co-dev
inspection:
reviewers look at code, then meet with dev
roles:
moderator (by technical person, guides discussion)
author (by author, explains decisions & gives overview)
reviewers (point out good/bad & possible problems/defects)
scribe (possibly by moderator, records discussion)
management is absent
process:
planning (code sent to moderator, selects reviewers)
setup (short overview given by author)
preparation (code & documentation is read)
meeting (factual questions are discussed)
report (scribe publishes report)
follow-up (moderator assigns problems to repair)
openJDK example:
review process:
develop & test code
upload patch to server
inform mailing list of patch
reviewers inspect code & give feedback
push change set (code + description + reviewers)
roles:
higher accepted changesets (#) unlocks new capabilities
(0) contributor (has signed contribution statement)
(2) author (has own username & listed on project webpage)
(8) committer (push-right & sponsors others change sets)
(32) reviewer (decides if code is good enough)
code generation
===
primary requirements are correctness & performance
key activities:
code selection (decide assembly for IR nodes)
code scheduling (determine order of execution)
register allocation (decide which operand in register)
register assignment (decide which register used)
x86 rules:
one operand may resides in memory or intermediate
6 int registers available
calling conventions:
caller preserved %eax, %ecx, %edx
callee preserved %ebx, %ebp, %esp, %esi, %edi
callee must restore %ebp %esp
assembly:
assume a in %eax, b in %ecx
CODE_START
movl $2, %eax # save 2 to b
movl %eax, %ecx # save a to b
addl %eax, %ecx # b = a + b
incr %eax, # a++
notl %eax, # a = - a - 1
xchg %eax, %ecx # temp = a; a = b; b = temp
imull %ecx, # a = b*a (result in edx:eax, each 32 bits)
imull %eax, %ecx # b = a*b (32 bit result)
cmp $2, %eax # set 0 flag if equal
je LABEL_1 # jump if cmp before equals
jg, jge LABEL_1 # jump if first greater (+equal)
jl, jle LABEL_1 # jump if first lesser (+equal)
jmp *(%eax) # indirect jump
jmp (%eax) # jump to address
LABEL_1:
call LABEL_1 # push address of next instruction to stack, then jump
CODE_END
sample assembly:
CODE_START
push %ebp # preserve %ebp
mov %esp, %ebp # top of the stack is now in %ebp
sub $16, %esp # make stack frame bigger
push $0xDEADBEEF # push stuff on stack; increasing %esp+4
mov 4(%ebp), %eax # get 1st passed parameter %ebp
mov %ebp, %esp # restore original %esp
pop %ebp # restore original %ebp
ret # returns (equivalent to pop %eax jmp *%eax)
CODE_END
operand access:
constant:
put in register or use directly
enforce size contains
fields:
reference of object + offset from symbol table
use resulting memory location to read/write
arrays:
reference of array + blocksize * index
row-after-row called row-major (x[0][0], x[0][1], ...)
access:
evaluate expressions & in intermediate
evaluate from left to right
assignment statement:
assign to variable, field, array
handle register shortage; preserved on stack & override
conditional statement:
if-then:
eval/test condition
if true then continue, else jump to end
if-then-else:
eval/test condition
continue if true then jump to end, else jump to else
loop statement:
while expr:
eval/test condition
if true then continue then jump back, else jump to end
but one unconditional branch for each expression
while expr v2:
jump body
eval/test condition
if true jump back to body
method invocation:
call site from caller invokes callee on target with actual arguments
calling conventions:
decides what caller/callee prepares
decides where date, return address, return value are saved
call actions:
identify target & starting address of callee
handle parameters (evaluate, push into stack)
save caller saved registers
find space of temporaries
determine & store return address
find space for return value
set up activation record for callee
transfer control to callee
return actions:
restore registers
transfer control to caller
deliver return value
remove activation record of callee
heap:
eats up, from low to high adresses
contains object instances
stack:
eats down, from high to low addresses
setup by run-time system, some assembly instructions
structure:
frame/base pointer fp determines start of frame (high address, low)
stack pointer sp determines end of frame (low address, high)
return value, parameters + target, return address (caller)
old fp, old sp, locals, temps (callee)
access stack elements:
offsets persisted in symbol table
access parameters with fp + offset (go to caller area)
access locals with fp - offset (inside callee area)
access temps with fp - offset (can grow indefinitely)
object layout:
object referenced by pointer p
p + 0 pointer to v-table
p + 4,... the corresponding fields
v-table:
created completely for each class
method order in same order as base class
code location same as base class if not overridden
call method:
dereference object pointer (movl (%eax), %eax)
dereference v-table at this + 0 (movl (%eax), %eax)
call code location form v-table offset (call *4(%eax))
data flow analysis
===
correct & accurate for all possible inputs (conservative)
applications:
reason about correctness (ensure value assigned)
enable optimizations (compile-time expression evaluation)
useful for tools (find all references of symbol)
remove unused code
control flow graph (CFG):
constructed over single method
basic box is executed as a unit
construction:
create blocks (started by label, ended by return/goto/condition)
reformulate if statements to "TCond# = condition; if (TCond#)"
connect blocks which are executed after one another
name blocks in order like B#
add ENTRY block (pointing to first block)
add EXIT block (successor of final blocks like returns)
analysis:
local, inside basic block (like subexpression elimination)
global, over CFG (like null checks)
inter-procedural (across methods/functions)
point:
after specific statement s1, before s2
after specific basic block b1, before b2
make claims at specific points
paths:
for all pairs xi, xi+1 it holds one of these two
(a) xi is point before b, xi+1 is point after b
(b) xi is point after b, xi+1 is point before connected b2
infeasible to process all (possibly infinite) paths
side effects:
never:
local variables / parameters
field assignments (if no dereferencing needed)
in expressions:
null-dereference
non-pure method evaluations
invalid array access
data flow framework generalization:
D as direction of data flow (forward/backwards)
V domain of values (possible results + bottom & top)
|><| meet operator (operating on lattices)
F transfer function (process block)
termination:
stop as soon as nothing has changed anymore
show that max number of transitions per element exists
constant propagation:
replace constant variables with its values
process in block:
set all variables to bottom
go to specific value if const assigned
else go to top
transfer:
overapproximate reverse-T -> exact -> T
evaluation:
directly place constants in code
reaching definitions:
check if/where variable is defined which is used in statement
d reaches u if d is not killed on the path to u
process in block:
create table, for each variable all definitions
gen by taking last definition in block of variable
kill with all other definitions of variables in gen
overapproximate gen = {}, kill = {all}
transfer:
IN(B) = { d reaches B } = U OUT(predecessors B)
OUT(B) = { d leaves B } = gen_b U (IN(B) - kill_b)
start with OUT(start) = empty, overapproximate with all
iterate by determining IN for all, then OUT for all, repeat
evaluation:
directly place expression of definitions in block
if d2 element_of IN(B) can replace d2 value at its usages
uninitialized variables:
let OUT(start) = { one fake definition for each value }
if variable is used with fake definition in IN(b), then possibly undefined
live variable:
check if variable is used after some assignment
dead variables need not to be stored / computed
process in block:
def set with all var defined in B prior to any usage
use set with all var used in B prior to any definition
overapproximate def = {}, use = {all}
transfer:
OUT(B) = { var live after B } = U IN(successors B)
IN(B) = { var live before B } = use_b U (OUT(B) - def_b)
start with IN(end) = empty, overapproximate with all
iterate by determining OUT for all; then IN for all; then repeat
evaluation:
if val not in OUT(B) do not need to store/compute it
local analysis:
start with OUT(B) as set of live variables
then for each statement add new live variables
use-def / def-use chains:
for use of variable, persist all possible definitions
for definition of variable, persist all possible uses
do not forget loop back uses
available expressions:
available if evaluated on every path and no breaking definition afterwards
breaking definition if any element is redefined
but allowed to yield different values depending on path
process in block:
gen set for evaluated expression, no breaking definition
kill set for breaking definition & no following evaluation
overapproximate gen = {}, kill = {all}
transfer:
IN(B) = { expr reaches B } = intersect OUT(predecessors B)
OUT(B) = { expr leaves B } = gen_b U (IN(B) - kill_b)
start with OUT(start) = empty, overapproximate with all
start with OUT(other) = all, overapproximate with none
iterate by going top to bottom, IN then OUT; then repeat
evaluation:
if expr in IN(B) then do not need to recompute
register allocation
===
core tasks:
which operand resides in register (map operands -> virtual registers)
which register to use for operand (map virtual register -> register)
reduce complexity by assigning operands first to virtual registers
in second step assign the virtual registers to physical ones
live ranges:
range where virtual register is live (pending use)
can compute liveness of VR like any normal variable
calculation:
instructions as list, for each row compute L and R
L liveness (not-killed variable v used in current/following rows)
R reaching (not-overridden definition d_v made in before rows)
live range where v element_of L intersects with d_v element_of R
multiple d_v for same v possible -> multiple live ranges possible
combine liveness with reaching to decouple different definitions
granularity:
starts at left of assignment, ends on right of other
or whole statements (min-length therefore 2 lines)
or whole basic block (min-length therefore 2 blocks)
interference:
if ranges overlap, then interfere with each other
can model as graph; nodes are live ranges, edge iff overlap
graph coloring:
k-colorable if k colors enough such that none touch
compilers try several K-colorings, then choose the best one
K coloring kempe's algorithm (simplification step):
push nodes with #neighbours < K on stack & remove from graph
until <= K nodes left (may not be possible -> graph surgery)
color the remaining nodes (trivial bc #neighbours < K-1)
now pop from stack, append to graph & assign color
but not guaranteed to succeed
graph surgery:
K coloring may still possible, but hard to find
if not found then heuristically "spill" (remove) node
repeat until
all #neighbours < K requirement reached
then continue kempe's algorithm
pick spill victim heuristics:
at random, lowest cost estimate, lowest use count
estimating spill cost by estimating basic block usage
heuristics (loops execute 10 times), runtime info (if JIT)
spilled variables:
variables residing in memory bc not enough registers
constraint that at most one operand can be in memory
keep spare register:
spilled variables put into spare register upon usage
hence enforce a K-1 coloring (or K-2 if both must be in registers)
introduce temporary:
introduced for each occurrence of spilled variable
filled before instruction with value from stack
lives only for two lines (load & use)
rewrite IR accordingly, rebuild graph and retry coloring
splitting:
reduce spilling (temporary loads) by splitting live ranges
heuristics to know where to split
color improvements:
coalescing:
remove extra copy instruction if not needed
(last usage of v1 assigned to v2 -> could possibly rename v2 to v1)
coalesce by combining nodes (but maybe K coloring then impossible)
copy property edges by marking them to attempt to use same color
pre-color:
depending on instruction specific registers must be used (like imull)
per-color operands; not removed during simplification
phase coupling:
code selection (imull instruction)
code scheduling (determines order of excution)
register allocation (bounds #operands in registers)
multiple inheritance
===
which base symbol to use:
error if no guidance provided
add parameter which defines which base class is used
require cast to either A or B
use order of extends clause
invoke both methods / randomly select one
force symbol renaming
prohibit MI usage
diamond problem (AB extends A,B; A/B extends X):
referencing X symbols ambiguous
local changes to class may breaks behaviour at other places
object layout problems:
example:
AB extends A,B { a, b, method() {}}
B { b, method() {}}
A { a, method() {}}
overriding fields:
offset of base classes may not matches to compiled functions
therefore can not share v-tables
choose to keep code of A (fields of A with same offset)
recompile B methods (bc fields have different offset)
casts with v-table duplication:
offset of methods does not match, need to divide object layout
first part like A layout, referencing to AB v-table
second part start with reference to B' v-table, then the fields
when casting, offset this pointer (-0 if statically A/AB, -8 if B)
casts with pointer adjustments:
add additional this offset entry in v-tables
no code recompile needed anymore
instead adapt this pointer as needed before invocation
but runtime performance suffers (at every method call)
C++:
can extend with public virtual X
virtuals constructed only once in object hierarchy
attacks
===
control flow highjack:
highjack control flow; continue at attacker specified location
possibly arbitrary machine code or reusing existing code
corrupt code pointers (return address, function pointer, vtable)
defences:
non-executable data (NX)
data execution prevention (DEP)
return address:
"ret" implicit "jmp return pointer"
try to corrupt return pointer (point to attacker location)
must reuse already existing code (because of NX/DEP)
return oriented programming (ROP):
use code snippets ending with ret, chain them together
put on stack address1, value, address2, value, ...
address space layout randomization (ASLR):
mapping program addresses to hardware
randomize hardware addresses (stack, heap, memory base)
stack canaries:
adds data before %epb/return address
if data is overwritten; then buffer overflow happened
but effectiveness depends on secret & runtime slower
powerful attacker:
no data execution / code corruption (NX/DEP)
but arbitrary data modification assumed
control flow integrity (CFI):
ensure control flow stays within CFG
bc attacker needs to change CFG to execute arbitrary code
but too conservative, needs recompilation
dynamic CFI:
battles drawbacks of compile time CFI
constructs CFG at run-time, supports dynamic libraries
Java HotSpot VM
===
java execution architecture:
hardware, OS, JVM, Java SE/EE, Spring, user applications
why VM:
non-VM languages reimplement compiler, libraries, debugger for each OS
VM implemented for all OS/architectures, more portable code
compilation:
java/scala/javascript/python to bytecode (ahead of time, using javac)
bytecode executed on abstract stack machine (by VM)
execution:
template based interpreter maps bytecode to machine code sequence
JIT compilers produce machine code from bytecode
both compiled/interpreted access same stack/heap
garbage collector manages heap
JIT compilers:
C1 (limited optimizations, but fast compilation & small footprint)
C2 (aggressive optimizations, but high resource demands)
compile methods:
because compilation expensive specific appropriate methods to compile
interpreters collects profiling infos about methods
aggressive & optimistic compilation by C1/C2 stored in code cache
if guards assumptions proven wrong then interpreter takes over again
example optimization:
inline method code; guard ensures object of the expected class
tired compilation:
compile interpreter (fast start), C1 (fast compile), C2 (fast code)
collect 100 samples with interpreter, then execute C1 optimization
then collect 1000 samples with C1-compiled code, then C2 optimization
but then naive code cache inefficient (high fragmentation, bad locality)
code cache:
continuous chunk of memory, stores code generated by JIT compilers
types of code:
non-method code (small, immortal livetime)
C1 profiled code (medium, limited life time)
C2 non-profiled code (large, long lifetime because expensive compilation)
segment:
divide code cache into non-profiled, profiled, non-methods
better cache locality (ITLB misses decreased)
sweeper (GC for code) faster because optimized for each segment
compact strings:
reduce memory footprint of chars (may occupy up to 45% of data)
save as latin-1 (1 byte) instead of UTF-16 (2 bytes) (best effort)
changes GC (deduplicates strings), String (payload byte array)
ahead of time compilation:
compile to native code prior to VM launch (improves startup)
compiled code shared between VMs (C1 performance)
value types:
immutable, identity-less aggregates (user defined primitives)
no identity, all fields final, no subclassing
advantages:
better locality (no indirection, no header)
relaxes heap pressure due to stack allocation (less GC)
new possible optimizations (like scalarization)
minimal value types (MVT):
subset of value types, meant as early access
denote class with Value Capable Class (VCC) annotation
then JVM derives value type class (DVC)
new value type bytecodes / method handles
storage formats:
on heap, with header
in thread local value buffer (TLVB) with header
scalarized by JIT code without header on stack/registers)
flattened array/field with type infos stored in container header
flattening:
remove references by inlining value types
for fields (only non-static)
for arrays (respect long alignment)
GC needs support to find references in inlined value types
optimizations:
copy detection (reuse existing)
passing flattened value type (passing fields as arguments)
return flattened value type & class
method handles need additional handling
explorations:
currently only limited set of unit tests
check performance with runtime checks (legacy code, MVP code)