Test-suite based automated program repair techniques are promising yet suffer from the overfitting issue. We investigate the feasibility and effectiveness of test case generation in alleviating the overfitting issue, and have proposed two approaches(MinImpact and UnsatGuided) for using test case generation to improve test suite based repair. The effectiveness of the proposed approaches is evaluated on 224 bugs of the Defects4J repository. We use state-of-art test generation tool EvoSuite in this study. This repository contains the source code used to run the experiment, test cases generated by EvoSuite, patches generated by our approaches, and our analysis of the correctness of the generated patches.
If you use this data, please cite:
Zhongxing Yu, Matias Martinez, Benjamin Danglot, Thomas Durieux and Martin Monperrus, "Alleviating Patch Overfitting with Automatic Test Generation: A Study of Feasibility and Effectiveness for the Nopol Repair System", In Empirical Software Engineering, Springer Verlag, 2018.
Bibtex Entry:
@article{yu:hal-01774223,
title = {Alleviating Patch Overfitting with Automatic Test Generation: A Study of Feasibility and Effectiveness for the Nopol Repair System},
author = {Yu, Zhongxing and Martinez, Matias and Danglot, Benjamin and Durieux, Thomas and Monperrus, Martin},
url = {https://hal.inria.fr/hal-01774223/file/alleviating_Overfitting.pdf},
journal = {{Empirical Software Engineering}},
publisher = {{Springer Verlag}},
year = {2018},
doi = {10.1007/s10664-018-9619-4},
}
We recently have made a major revision of the above arxiv preprint and have submitted the new version to the Empirical Software Engineering journal (EMSE). The new version is titled "Alleviating Patch Overfitting with Automatic Test Generation: A Study of Feasibility and Effectiveness for the Nopol Repair System". In this new version, we deeply analyze the overfitting problem in program repair and give a classification of this problem. We also analyze and systematically evaluate the effectiveness of UnsatGuided in alleviating different kinds of overfitting behaviours. Experimetal data related with this new version are also put in this repository.
The src folder contains the source code used to run the experiment. Among which, the folder Nopol+UnsatGuided contains the code used to run the experiment of combining Nopol with UnsatGuided (the proposed approach for using test case generation to improve synthesis-based based repair techniques), and the folder jGenProg+MinImpact contains the code used to run the experiment of combining jGenProg with MinImpact (the proposed approach for using test case generation to improve generate and validate repair techniques).
The results folder contains the experimental results. Similarly, the folder Nopol+UnsatGuided and jGenProg+MinImpact contain the results for combing Nopol with UnsatGuided and combining jGenProg with MinImpact respectively. In particular, the folder Nopol+UnsatGuided contains the experimental data for our new submission to EMSE and inside the folder, we give a more detailed explanation of the structure of this folder.
The test generation tool used in this study is EvoSuite, and the EvoSuite version used corresponds to the commit ID 7a694a3aa2c5d4025c1ba0e1e9ef454398001a8b.
In particular, our experiment runs EvoSuite with the following setting:
" -Dassertion_timeout=1800 -Dminimization_timeout=1800 -Djunit_check_timeout=1800 -Dwrite_junit_timeout=300
-Dinitialization_timeout=300 -Dglobal_timeout=18000 -Dsearch_budget=100000 -Dstopping_condition=MaxStatements
-Dno_runtime_dependency=true -Dsandbox=false -Dp_reflection_on_private=0.0 -Dreflection_start_percent=1.0
-Dp_functional_mocking=0.0 -Dfunctional_mocking_percent=1.0 -mem 2000 "
We run EvoSuite 30 times with different seeds to account for the randomness of EvoSuite.