Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated Trace Collection #5

Open
ishimwed opened this issue Dec 13, 2022 · 0 comments
Open

Automated Trace Collection #5

ishimwed opened this issue Dec 13, 2022 · 0 comments

Comments

@ishimwed
Copy link
Collaborator

Trace collection

Currently we manually instrument recursive functions to collect the problem size and depth of recursion at the beginning of each iteration.
We also instrument the recursive function and helper functions (if available) to collect the size and number of iterations per recursive step as shown in the example below:

Descriptive example

def bubble_sort(arr, n, depth, trace_file):
    # L1: trace problem size and depth of recursion
    with open(trace_file, 'a') as f:
        print("{};{}".format(depth, n), file=f)
    if n==1:
        return arr
    for i in range(n-1):
        if arr[i]>arr[i+1]:
            # L2: trace loop iterations
            if depth == 0: #we only count iterations for one recursion step.
                global counter
                counter = counter + 1
            arr[i], arr[i+1] = arr[i+1], arr[i]
        return bubble_sort(arr, n-1, depth+1, trace_file) 

#Driver to collect all traces
counter = 0
def main():
    global counter
    counter = 0
    size = random.randint(1, 500)
    arr = random_list(size)
    depth = 0
    path = "./bubble_sort"
    try:
        os.mkdir(path)
    except OSError as error:
        pass
    file = "./bubble_sort/output-{}".format(size) #naming convention 
    bubble_sort(arr, size, depth, file)
   # L3: trace total loop iteration and per problem size
    with open("./bubble_sort/traces", 'a') as f:
        print("{};{}".format(size, counter), file=f)
    counter = 0
    
if __name__ == '__main__':
    for i in range(100): #run the bubblesort a few times to collect enough traces
       main()

In the example above, we manually collect bubblesort execution traces by manually instrumenting bubblesort function at L1 to collect size of input list (i.e. problem size) and depth of recursion and store these traces in a trace_file. We also track loop iteration at L2 by incrementing the a loop counter and storing this loop counter and original size of input list in a different trace_file at L3 following some file naming convention (explained in the README).

Problem Description

In this issue we would like to improve dynaplex in two ways: automate the code instrumentation process and parametrize what kind of traces to be collected.

Automated code instrumentation

We can automate the above process by inserting code statements into the Abstract Syntax Tree (AST) of the program (ex: bubblesort). There are different libraries or framework to obtain and edit the AST of a program depending on the programming language. In this example, I will suggest using CIL which helps to access and manipulate the AST of a C program. However, this is not mandatory as there are other frameworks that accomplish the same task for different programming languages you are comfortable with (ex: ast module for python programs).

In order to use CIL you need some knowledge of Ocaml. Fortunately, I have a few examples to jump start this task. The APR-input-rectification repo contains my PhD qualifying exam from 2019. In this repo, I attempt to automatically repair C programs by rectifying the inputs. The repo contains different Ocaml programs to insert, edit or remove program statement using CIL. We can modify these programs to insert statement to track necessary traces as explained above. To simplify this problem there is a trace collection library (kshlib.h) that provides API for trace collection (problem size, loop counters, depth).

Parametrize trace collection

So far Dynaplex only infers bounds on time complexity therefore it collects traces related to problem size. We would like to expand Dynaplex to also infer bounds on other arbitrary resource consumption. For instance, we would like Dynaplex to infer the upper bounds on specific API calls. In order to accomplish this task we augment our code instrumentation above to allow the user to specify a feature (time, API or space) on which to infer an upper bound.
In the example below, we would like to infer an upper bound on the number of times API1 is called in terms of input size.

def foo(arr, depth, api1_counter, trace_file):
      # L1: trace api1_counter and depth of recursion
      with open(trace_file, 'a') as f:
          print("{};{}".format(depth, api1_counter), file=f)
      if len(arr)==1:
          return arr
      for i in range(len(arr)-1):
           # L2: trace loop iterations
           if depth == 0: #we only count iterations for one recursion step.
               counter = counter + 1
           api1(arr)
           api1_counter += 1
          return foo(arr, depth+1, api1_counter, trace_file) 

We introduce a counter variable to track API1 calls and instrument foo() to trace API1_counter instead of problem size and infer bounds on API1_counter
Note: We still need to create a list of features that can be handled by Dynaplex and devise a way to collect necessary traces for the bound inference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant