This repository provides access to the dataset associated with our accepted papers on fault seeding and evaluation of mutation testing techniques. The dataset is designed to support empirical studies, offering insights into the syntactic and semantic aspects of artificially seeded faults a.k.a. mutants, generated by mutation testing tools (PIT, μBERT, iBIR, DeepMutation) and bugs (Defects4J).
- Syntactic Vs. Semantic similarity of Artificial and Real Faults in Mutation Testing Studies
- On Comparing Mutation Testing Tools through Learning-based Mutant Selection
The source code for the deep learning approach, Cerebro, employed for mutant selection is available here.
The source code to perform simulation and to help identify the semantic and syntactic correlation between bugs and mutants is available in Simutate repository.
To cite our papers or the dataset, please use the BibTeX entries available in cite.bib.
Important Note: Due to GitHub's restriction on file sizes, the dataset files are zipped to a maximum of 100 MB each.
The dataset is organized as follows:
- The source code of all the 595 Defects4J bugs considered in our study is available in projects_source_code_buggy directory.
- The source code of fixes to all the bugs is available in projects_source_code_fixed directory. These fixes were considered for mutation purposes.
- The details of fixes to all the bugs with tests are available in fixes_for_all_bugs_with_tests directory.
- The statements modified in the bug-fixes are available in changed_lines_to_fix_bugs directory.
- The details on the tests that failed for bugs are in groundtruth_bugs_failing_tests directory.
- The mutants generated by the mutation testing tools μBERT, iBIR, and DeepMutation are available in mutants_generated_via_CodeBERT, mutants_generated_via_iBIR, and mutants_generated_via_DeepMutation directories, respectively.
- The details on the operators employed by the mutation testing tools are available in mutation_operators_employed_by_mutation_testing_tools directory.
- The details of all the tests failed by the mutants generated via μBERT, and DeepMutation are available in failed_tests_by_CodeBERT_mutants and failed_tests_by_DeepMutation_mutants directories, respectively.
- The details and scores of semantic comparison between the bugs and the mutants generated by the mutation testing tools are available in semantic_similarity_between_bugs_and_CodeBERT_mutants, semantic_similarity_between_bugs_and_DeepMutation_mutants, and semantic_similarity_between_bugs_and_iBIR_mutants directories, respectively.
- The details and scores of syntactic comparison between the bugs and the mutants generated by the mutation testing tools are available in syntactic_similarity_between_bugs_and_CodeBERT_mutants, syntactic_similarity_between_bugs_and_DeepMutation_mutants, and syntactic_similarity_between_bugs_and_iBIR_mutants directories, respectively.
- Out of all the mutants generated by the mutation testing tools, the details of the mutants predicted as subsuming (and non-subsuming) by the deep learning based mutant selection approach Cerebro, based on its training n-fold cross-evaluation, are available in Cerebro_predicted_mutants directory.
Please feel free to explore and utilize the dataset for your research and testing evaluations. If you have any questions or need further clarification, please feel free to reach out. Thank you for your interest and collaboration.