Here, you will find tools and datasets related to the research done by SCANL.
SCANL stands for the Source Code and Natural Language Laboratory. We are a diverse team of scientsits dedicated to studying the latent connection between source code behavior and the natural language elements used to describe that behavior. Feel free to visit https://www.scanl.org/ to learn more about who is part of the lab and to find more about our goals and research motivations.
We have tools, datasets, and learning/educational resources. We will briefly describe each below, but refer to their individual repositories for more information.
Name | Description |
---|---|
Identifier Name Structure Catalogue | A catalogue of identifier name structures found in code and their significance to program behavior. This catalogue also covers various perspectives on how research literature characterizes identifier name meaning and behavior. |
Ensemble Tagger | A part-of-speech tagger designed to work on the specialized phrase structure of identifiers (e.g., variable names). |
IDEAL | An identifier name appraisal and recommendation tool. |
srcML Identifier Getter Tool | A tool for collecting samples of identifier names from software systems using srcML. It can help you take statistically sound samples for research on identifier names. |
Project Sunshine | An implementation of the linguistic anti-patterns and soon-to-be merged with IDEAL to create a framework for identifier name appraisal and recommendation. |
Datasets | The current home of the abbreviation study data set, the grammar patterns data set, and the ensemble tagger train/test data set |
We also host some (potentially modified) tools that other researchers made: SWUM has been modified by us to act primarily as a part-of-speech tagger. POSSE is the same; modified to help us use it as a part-of-speech tagger more easily. The Ensemble Tagger (mentioned above) uses it. The other public repositories are student projects, sample code, or misc tools that we use internally (i.e., not explicitly meant to be easy for others to use).
If you have trouble with any of our tools/datasets, please make an issue! In addition, if you like what we do, leave a star on the project-- it helps us know what to focus our maintenance efforts on and what kind of content people want to see most!