-
Notifications
You must be signed in to change notification settings - Fork 208
discussions
Today we discussed how to measure the productivity of rules. Empirically, we found out that rules that consume a lot of runtime and are not productive, are good candidates for hand-optimisation.
The current soufflé-profiler measures productivity by measuring how many tuples a rule produces vs. the number of tuples of the whole soufflé program, i.e.,
productivity_0 ( R ) = #tuples produced by R / #tuples in the IDB
The productivity is in the range between 0 and 1. A productivity of zero means that the rule has not produced a single rule and a productivity of one means that all tuples in the IDB were produced by rule R.
The aforementioned definition measurement of productivity is a relative measure of productivity.
It gave a good indication of productivity for points-to analysis programs and other problems. I strongly encourage to look into these numbers for finding optimisation opportunity.
Normally only 10-15 rules consume most of the time. Finding the least productive rules in the set of longest running ones is a good starting point for optimisations.
However, the above metric of productivity is not the only possible metric for productivity of a rule. We could think of other alternatives:
Consider the rule R(x,y,z) :- A(x), B(y), c(z) produces the cross product of the relation A, B, and C. The associated loop-nest of the rule is super-productive: no single tuple is wasted in each loop for producing a new tuple for relation R. The loop nest could be written as
for x in A:
for y in B:
for z in C:
project (x,y,z) into R
and we can see that there is no single constraint in a loop and hence we have |A||B||C| invocations of project. Basically, for each tuple configuration we can generate a proof step. Hence, we could phrase productivity as an absolute measurement independent of other rules and in respect to the potential work of a loop-nest. For the above example, the potential work of the loop-nest is |A||B||C|. We could express productivity as follows,
productivity_1 ( A :- B1, …., Bk ) = |B1| …. |Bk| / |number of Projections|
For a cross product, the productivity is one. The amount of work becomes independent of the total work a program produces. This metric is focused on a single rule. For more constraint rules, the productivity may be less than one, and if a rule does not have a single successful projection, we have a productivity of zero. Hence, we have the trivial upper and lower bounds of zero and one for this metric.
For example, for rule R(x,z) :- A(x,y), B(y,z) the second loop is constrained, i.e.,
for (x,y1) in A:
for (y2,z) in B:
if y1 = y2:
project (x,z) into R
The potential work of the loop-nest is |A| * |B|. However, there might be significant less number of projections depending on the concrete instances of relation A and B due to the constraint induced by the variable binding of variable y. Note that this productivity metric sets in relation the potential work of a loop-nest with the actual number of projections and hence this metric is agnostic for the actual ordering problem in rules. The productivity measurement perhaps may be more realistic for a rule – this need to be shown.
To take the order of predicates into account we can refine the notion of metric by counting the failed and successful iterations of a loop. The Rule A :- B1, …, Bk produces a conceptualized loop nest as follows,
for t1 in B1:
if c1(t1):
for t2 in B2:
if c2(t2):
…
for tk in Bk:
if ck(tk):
project (t1,…., tk) into A
where the conditions c1, … , ck are the encodings of Datalog constraints. We could introduce a pair of counters for each loop in the loop nest. One counter counts the number of successful iterations of a loop and the other counter counts the number of failed iterations. Setting both numbers in relation we can express the productiveness of a single predicate. We could render each predicate in the body with a colour. A predicate in the rule is assigned a colour depending on its productiveness. We would transform the above loop as follows:
for t1 in B1:
success1 += c1(t1);
fail1 += !c1(t1);
if c1(t1):
for t2 in B2:
success2 += c2(t2);
fail2 += !c2(t2);
if c2(t2):
…
for tk in Bk:
successk += ck(tk);
failk += !ck(tk);
if ck(tk):
project (t1,…., tk) into A
The productivity of a predicate in the rule would be defined by:
productivity _2 ( success(i) / (success(i)+fail(i)) )
where i-th predicate in Rule A :- B1, …, Bk.
These are just two ideas to add more metrics to Souffle. They may give us a deeper understanding of productivity.