Modeling Developer Expertise with Zipf's law

ZipfModeling is an expertise model to identify expert developers based on how they write code.
"Tokenizer", you can tokenize the content of python code into the AST nodes of source code and then collect the syntax patterns
"ZipfModeling_syndt", shows the probability distribution of alpha in Zipf after fitting on synthatic data.
"Model_train_realdt" and "Zipf_logLikelihood", collect syntax patterns from real projects fetched from GitHub and fit the distribution of the syntax patterns for each developer with Zipf's law.
"Zipf-Validity" explores the threat to the validity of fitting the data with the Zipf distribution.
"sourcecodeAnalyzer" shows with few examples how to convert code into AST and then collect its nodes as syntax patterns.
We generate a labeled synthetic dataset by resampling real data. This dataset contains 1200 developers in two categories of "Expert" and "Novice". With this data you can explore the validity of expertise models based on the content of programming code.
"Dataset" contains raw real dataset which is extracted from GitHub repositories. The dataset includes following features:
. 'commit_ID','Author', 'Authored_Date','email','msg','Commiter','committer_date', 'project_path','Commit_before', 'Commit_after','diff','Added_LOC','Removed_LOC', 'Num_LOC'

Citation

Assessing developer expertise from the statistical distribution of programming syntax patterns

@article{,
  title={Assessing developer expertise from the statistical distribution of programming syntax patterns},
  author={Moradi Dakhel, Arghavan and C. Desmarais, Michel and Khomh, Foutse},
  Conferance={Evaluation and Assessment in Software Engineering},
  pages={90--99},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Dataset		Dataset
EASE_digram.PNG		EASE_digram.PNG
Model_train_realdt.ipynb		Model_train_realdt.ipynb
README.md		README.md
Zipf-Validity.ipynb		Zipf-Validity.ipynb
ZipfModeling_syndt.ipynb		ZipfModeling_syndt.ipynb
Zipf_logLikelihood.ipynb		Zipf_logLikelihood.ipynb
sourcecodeAnalyzer.ipynb		sourcecodeAnalyzer.ipynb
syn_data.zip		syn_data.zip
tokenizer.ipynb		tokenizer.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modeling Developer Expertise with Zipf's law

Citation

About

Releases

Packages

Languages

ExpertiseModel/ZipfModel

Folders and files

Latest commit

History

Repository files navigation

Modeling Developer Expertise with Zipf's law

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages