The requirements to run all the scripts can be found in REQUIREMENTS.md
We followed a two-phase data extraction methodology to analyze the documentations in 11 open-source GitHub repositories or projects (listed in Table 1 of the paper).
- Phase 1 prepares data for RQ 1 by randomly extracting functions from each project. The script used for data extraction and steps required to run the scripts can be found in RQ1_scripts.
- Phase 2 prepares data for RQ 2,3,4,5 by extracting commit logs where only documentation changes were made to a method. The scripts and instructions to run these scripts and time required to run on each project can be found in RQ2_scripts.
As an output, we obtained a dataset of
- 50 functions from each project
- all the commits from a project from a 2-year span (June 15, 2018 to June 15, 2020) containing changes in a method-documentation and extracted their associated parameters such as method signature, line numbers changed, etc.
The obtained dataset was manually analyzed for various features discussed in the paper. The annotated dataset has been shared along in Dataset_Observations.xlsx. By the means of Artifact Evaluation Track, we want to make all the scripts and the dataset AVAILABLE and REUSABLE for further research.
Please find in AUTHORS.md.