Skip to content

InvAASTCluster: AASTs and Invariant-Based Program Clustering

License

Notifications You must be signed in to change notification settings

pmorvalho/InvAASTCluster

Repository files navigation

InvAASTCluster: On Applying Invariant-Based Program Clustering to Introductory Programming Assignments

This is an implementation of the program clustering framework for introductory programming assignments (IPAs), described in the following paper: "InvAASTCluster: On Applying Invariant-Based Program Clustering to Introductory Programming Assignments" (https://arxiv.org/abs/2206.14175).

InvAASTCluster is a novel and efficient approach for clustering submissions for introductory programming assignments (IPAs) based on the submissions' sets of program invariants and anonymized abstract syntax tree (AAST) representations.

InvAASTCluster was designed as an independent clustering tool. Therefore, it can be used to help evaluate students' submissions for IPAs by clustering semantically equivalent solutions for programming exercises, although InvAASTCluster can also be easily integrated into any clustering-based program repair tool for IPAs. However, some program repair tools use a single reference implementation provided by the lecturer to repair a student's program. As a result, these tools usually are only able to use one correct implementation to repair each program. Therefore, given an incorrect submission, InvAASTCluster was designed to be also capable of finding on a set of correct student submissions which submission is the closest correct solution to the incorrect program, i.e., a specific reference implementation for each incorrect submission, that may require fewer changes to fix the program.

How to use InvAASTCluster

  • To run InvAASTCluster on a new set of IPAs (e.g. new_dataset)

    In order to use InvAASTCluster on a new set of IPAs, new_dataset, the user should divide the set of IPAs into correct and semantically incorrect submissions. Then, the user should put the correct submissions on the subdirectory correct_submissions/new_dataset and the set of incorrect submissions on the folder incorrect_submissions/new_dataset.

    Afterward, the scripts bash_scripts/run_all_clustering.sh and bash_scripts/run_all_repairing.sh should be modified in order to use the new dataset.

  • To run InvAASTCluster on the available datasets of IPAs, ITSP and C-Pack-IPAs, the user should execute:

    • Only for clustering all the IPAs:
    bash ./bash_scripts/run_all_clustering.sh
    
    • For clustering and repairing:
    bash ./bash_scripts/run_all_repairing.sh
    

References

Pedro Orvalho, Mikoláš Janota, and Vasco Manquinho. InvAASTCluster: On Applying Invariant-Based Program Clustering to Introductory Programming Assignments. 2022. https://arxiv.org/pdf/2206.14175.pdf

Introductory Programming Assignments (IPAs) Datasets

Installation Requirements

  • Python 3.8.5

  • pycparser : version 2.21

    pip install pycparser==2.21
    
  • numpy : version 1.19.2

    pip install numpy==1.19.2
    
  • Clara

    Clara is the program repair framework used. Clara should be installed as a submodule in the subdirectory "InvAASTCluster/clara". To install Clara, follow the instructions available on https://github.com/iradicek/clara. The user should create a conda environment called "clara" to run clara with our scripts.

  • Daikon

    Daikon was used to compute dynamically-generated likely invariants observed over several program executions for each student submission using a set of predefined input-output tests for each programming assignment. To install Daikon follow the instructions available on https://plse.cs.washington.edu/daikon/download/.

  • runsolver

    Runsolver was used to control the memory and CPU used and timeout while running the program clustering/repairing evaluations. To install Runsolver follow the instructions available on https://github.com/utpalbora/runsolver.