Skip to content
Bernhard Scholz edited this page Feb 8, 2018 · 2 revisions

Design Discussions

Profiler Discussion

Measure Productivity

Today we discussed how to measure the productivity of rules. Empirically, we found out that rules that consume a lot of runtime and are not productive, are good candidates for hand-optimisation.

The current soufflé-profiler measures productivity by measuring how many tuples a rule produces vs. the number of tuples of the whole soufflé program, i.e.,

productivity_0 ( R )  = #tuples produced by R / #tuples in the IDB

The productivity is in the range between 0 and 1. A productivity of zero means that the rule has not produced a single rule and a productivity of one means that all tuples in the IDB were produced by rule R.

The aforementioned definition measurement of productivity is a relative measure of productivity.

It gave a good indication of productivity for points-to analysis programs and other problems. I strongly encourage to look into these numbers for finding optimisation opportunity.

Normally only 10-15 rules consume most of the time. Finding the least productive rules in the set of longest running ones is a good starting point for optimisations.

However, the above metric of productivity is not the only possible metric for productivity of a rule. We could think of other alternatives:

Consider the rule R(x,y,z) :- A(x), B(y), c(z) produces the cross product of the relation A, B, and C. The associated loop-nest of the rule is super-productive: no single tuple is wasted in each loop for producing a new tuple for relation R. The loop nest could be written as

for x in A: 
 for y in B: 
  for z in C:
   project (x,y,z) into R

and we can see that there is no single constraint in a loop and hence we have |A||B||C| invocations of project. Basically, for each tuple configuration we can generate a proof step. Hence, we could phrase productivity as an absolute measurement independent of other rules and in respect to the potential work of a loop-nest. For the above example, the potential work of the loop-nest is |A||B||C|. We could express productivity as follows,

 productivity_1 ( A :- B1, …., Bk ) = |B1| …. |Bk| / |number of Projections|

For a cross product, the productivity is one. The amount of work becomes independent of the total work a program produces. This metric is focused on a single rule. For more constraint rules, the productivity may be less than one, and if a rule does not have a single successful projection, we have a productivity of zero. Hence, we have the trivial upper and lower bounds of zero and one for this metric.

For example, for rule R(x,z) :- A(x,y), B(y,z) the second loop is constrained, i.e.,
   for (x,y1) in A: 
     for (y2,z) in B:
       if y1 = y2: 
         project (x,z) into R

The potential work of the loop-nest is |A| * |B|. However, there might be significant less number of projections depending on the concrete instances of relation A and B due to the constraint induced by the variable binding of variable y. Note that this productivity metric sets in relation the potential work of a loop-nest with the actual number of projections and hence this metric is agnostic for the actual ordering problem in rules. The productivity measurement perhaps may be more realistic for a rule – this need to be shown.

To take the order of predicates into account we can refine the notion of metric by counting the failed and successful iterations of a loop. The Rule A :- B1, …, Bk produces a conceptualized loop nest as follows,

 for t1 in B1:
   if c1(t1):  
     for t2 in B2:
      if c2(t2): 
       …
           for tk in Bk: 
            if ck(tk): 
               project (t1,…., tk) into A

where the conditions c1, … , ck are the encodings of Datalog constraints. We could introduce a pair of counters for each loop in the loop nest. One counter counts the number of successful iterations of a loop and the other counter counts the number of failed iterations. Setting both numbers in relation we can express the productiveness of a single predicate. We could render each predicate in the body with a colour. A predicate in the rule is assigned a colour depending on its productiveness. We would transform the above loop as follows:

for t1 in B1:
   success1 += c1(t1);
   fail1 += !c1(t1); 
   if c1(t1):  
     for t2 in B2:

     success2 += c2(t2);
      fail2 += !c2(t2);
        if c2(t2): 
       …
           for tk in Bk:

             successk += ck(tk);
             failk += !ck(tk);       
            if ck(tk): 
               project (t1,…., tk) into A

The productivity of a predicate in the rule would be defined by:

productivity _2 ( success(i) / (success(i)+fail(i)) )

where i-th predicate in Rule A :- B1, …, Bk.

These are just two ideas to add more metrics to Souffle. They may give us a deeper understanding of productivity.