discussions

Design Discussions

Profiler Discussion

Measure Productivity

Today we discussed how to measure the productivity of rules. Empirically, we found out that rules that consume a lot of runtime and are not productive, are good candidates for hand-optimisation.

The current soufflé-profiler measures productivity by measuring how many tuples a rule produces vs. the number of tuples of the whole soufflé program, i.e.,

productivity_0 ( R )  = #tuples produced by R / #tuples in the IDB

The productivity is in the range between 0 and 1. A productivity of zero means that the rule has not produced a single rule and a productivity of one means that all tuples in the IDB were produced by rule R.

The aforementioned definition measurement of productivity is a relative measure of productivity.

It gave a good indication of productivity for points-to analysis programs and other problems. I strongly encourage to look into these numbers for finding optimisation opportunity.

Normally only 10-15 rules consume most of the time. Finding the least productive rules in the set of longest running ones is a good starting point for optimisations.

However, the above metric of productivity is not the only possible metric for productivity of a rule. We could think of other alternatives:

Consider the rule R(x,y,z) :- A(x), B(y), c(z) produces the cross product of the relation A, B, and C. The associated loop-nest of the rule is super-productive: no single tuple is wasted in each loop for producing a new tuple for relation R. The loop nest could be written as

for x in A: 
 for y in B: 
  for z in C:
   project (x,y,z) into R

and we can see that there is no single constraint in a loop and hence we have |A||B||C| invocations of project. Basically, for each tuple configuration we can generate a proof step. Hence, we could phrase productivity as an absolute measurement independent of other rules and in respect to the potential work of a loop-nest. For the above example, the potential work of the loop-nest is |A||B||C|. We could express productivity as follows,

 productivity_1 ( A :- B1, …., Bk ) = |B1| …. |Bk| / |number of Projections|

For a cross product, the productivity is one. The amount of work becomes independent of the total work a program produces. This metric is focused on a single rule. For more constraint rules, the productivity may be less than one, and if a rule does not have a single successful projection, we have a productivity of zero. Hence, we have the trivial upper and lower bounds of zero and one for this metric.

For example, for rule R(x,z) :- A(x,y), B(y,z) the second loop is constrained, i.e.,
   for (x,y1) in A: 
     for (y2,z) in B:
       if y1 = y2: 
         project (x,z) into R

The potential work of the loop-nest is |A| * |B|. However, there might be significant less number of projections depending on the concrete instances of relation A and B due to the constraint induced by the variable binding of variable y. Note that this productivity metric sets in relation the potential work of a loop-nest with the actual number of projections and hence this metric is agnostic for the actual ordering problem in rules. The productivity measurement perhaps may be more realistic for a rule – this need to be shown.

To take the order of predicates into account we can refine the notion of metric by counting the failed and successful iterations of a loop. The Rule A :- B1, …, Bk produces a conceptualized loop nest as follows,

 for t1 in B1:
   if c1(t1):  
     for t2 in B2:
      if c2(t2): 
       …
           for tk in Bk: 
            if ck(tk): 
               project (t1,…., tk) into A

where the conditions c1, … , ck are the encodings of Datalog constraints. We could introduce a pair of counters for each loop in the loop nest. One counter counts the number of successful iterations of a loop and the other counter counts the number of failed iterations. Setting both numbers in relation we can express the productiveness of a single predicate. We could render each predicate in the body with a colour. A predicate in the rule is assigned a colour depending on its productiveness. We would transform the above loop as follows:

for t1 in B1:
   success1 += c1(t1);
   fail1 += !c1(t1); 
   if c1(t1):  
     for t2 in B2:

     success2 += c2(t2);
      fail2 += !c2(t2);
        if c2(t2): 
       …
           for tk in Bk:

             successk += ck(tk);
             failk += !ck(tk);       
            if ck(tk): 
               project (t1,…., tk) into A

The productivity of a predicate in the rule would be defined by:

productivity _2 ( success(i) / (success(i)+fail(i)) )

where i-th predicate in Rule A :- B1, …, Bk.

These are just two ideas to add more metrics to Souffle. They may give us a deeper understanding of productivity.

Build Soufflé for developers

Design of Souffle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

discussions

Design Discussions

Profiler Discussion

Measure Productivity

Clone this wiki locally