Automated Trace Collection #5

ishimwed · 2022-12-13T18:24:00Z

Trace collection

Currently we manually instrument recursive functions to collect the problem size and depth of recursion at the beginning of each iteration.
We also instrument the recursive function and helper functions (if available) to collect the size and number of iterations per recursive step as shown in the example below:

Descriptive example

def bubble_sort(arr, n, depth, trace_file):
    # L1: trace problem size and depth of recursion
    with open(trace_file, 'a') as f:
        print("{};{}".format(depth, n), file=f)
    if n==1:
        return arr
    for i in range(n-1):
        if arr[i]>arr[i+1]:
            # L2: trace loop iterations
            if depth == 0: #we only count iterations for one recursion step.
                global counter
                counter = counter + 1
            arr[i], arr[i+1] = arr[i+1], arr[i]
        return bubble_sort(arr, n-1, depth+1, trace_file) 

#Driver to collect all traces
counter = 0
def main():
    global counter
    counter = 0
    size = random.randint(1, 500)
    arr = random_list(size)
    depth = 0
    path = "./bubble_sort"
    try:
        os.mkdir(path)
    except OSError as error:
        pass
    file = "./bubble_sort/output-{}".format(size) #naming convention 
    bubble_sort(arr, size, depth, file)
   # L3: trace total loop iteration and per problem size
    with open("./bubble_sort/traces", 'a') as f:
        print("{};{}".format(size, counter), file=f)
    counter = 0
    
if __name__ == '__main__':
    for i in range(100): #run the bubblesort a few times to collect enough traces
       main()

In the example above, we manually collect bubblesort execution traces by manually instrumenting bubblesort function at L1 to collect size of input list (i.e. problem size) and depth of recursion and store these traces in a trace_file. We also track loop iteration at L2 by incrementing the a loop counter and storing this loop counter and original size of input list in a different trace_file at L3 following some file naming convention (explained in the README).

Problem Description

In this issue we would like to improve dynaplex in two ways: automate the code instrumentation process and parametrize what kind of traces to be collected.

Automated code instrumentation

We can automate the above process by inserting code statements into the Abstract Syntax Tree (AST) of the program (ex: bubblesort). There are different libraries or framework to obtain and edit the AST of a program depending on the programming language. In this example, I will suggest using CIL which helps to access and manipulate the AST of a C program. However, this is not mandatory as there are other frameworks that accomplish the same task for different programming languages you are comfortable with (ex: ast module for python programs).

In order to use CIL you need some knowledge of Ocaml. Fortunately, I have a few examples to jump start this task. The APR-input-rectification repo contains my PhD qualifying exam from 2019. In this repo, I attempt to automatically repair C programs by rectifying the inputs. The repo contains different Ocaml programs to insert, edit or remove program statement using CIL. We can modify these programs to insert statement to track necessary traces as explained above. To simplify this problem there is a trace collection library (kshlib.h) that provides API for trace collection (problem size, loop counters, depth).

Parametrize trace collection

So far Dynaplex only infers bounds on time complexity therefore it collects traces related to problem size. We would like to expand Dynaplex to also infer bounds on other arbitrary resource consumption. For instance, we would like Dynaplex to infer the upper bounds on specific API calls. In order to accomplish this task we augment our code instrumentation above to allow the user to specify a feature (time, API or space) on which to infer an upper bound.
In the example below, we would like to infer an upper bound on the number of times API1 is called in terms of input size.

def foo(arr, depth, api1_counter, trace_file):
      # L1: trace api1_counter and depth of recursion
      with open(trace_file, 'a') as f:
          print("{};{}".format(depth, api1_counter), file=f)
      if len(arr)==1:
          return arr
      for i in range(len(arr)-1):
           # L2: trace loop iterations
           if depth == 0: #we only count iterations for one recursion step.
               counter = counter + 1
           api1(arr)
           api1_counter += 1
          return foo(arr, depth+1, api1_counter, trace_file)

We introduce a counter variable to track API1 calls and instrument foo() to trace API1_counter instead of problem size and infer bounds on API1_counter
Note: We still need to create a list of features that can be handled by Dynaplex and devise a way to collect necessary traces for the bound inference.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated Trace Collection #5

Automated Trace Collection #5

ishimwed commented Dec 13, 2022

Automated Trace Collection #5

Automated Trace Collection #5

Comments

ishimwed commented Dec 13, 2022

Trace collection

Descriptive example

Problem Description

Automated code instrumentation

Parametrize trace collection