Skip to content

ℹ️ Classification of issues from GitHub repositories

Notifications You must be signed in to change notification settings

ianbandrade/DATA-SCIENCE_LAB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Classification of issues from GitHub repositories

Foundation

This project was developed during the course of measurement and experimentation laboratory in software engineering. In addition, the information has been translated and modified following this repository.

Introduction


GitHub repositories have a space dedicated to Issues. Issues are topics submitted by users and people who contribute to a repository, and serve to report issues found, ask questions, report vulnerabilities and so on.

An example we can look at is the issues page in the React Repository. Note that some issues are labeled with a label (example: 'Type: Bug'), however often this label needs to be manually entered by the user submitting the issue. Since issues are not correctly labeled, many of the bugs reported by users and contributors are not identified by repository maintainers.

The aim of this project is to create a mechanism to identify whether a issue reports a bug or not, so that in the future they can be automatically classified. In this way, the developers responsible for the repository will be able to more effectively filter reported bugs.

Dataset


To carry out the project, we will use a pre-processed sample of the dataset GitHub Bugs Prediction, made available on the community platform Kaggle.

The dataset consists of three columns:

  • Title - The title of the GitHub Issue
  • Body - The GitHub Issue body
  • Label - Represents the label of that issue (Bug; Feature; Question)

Releases

No releases published

Packages

No packages published