Genetic-Algorithm-Based-Cost-Sensitive-Learning

A genetic algorithm based approach for cost sensitive learning, in which the misclassification cost is considered together with the cost of feature extraction.

Dataset:

Thyroid Disease Data Set

When the class label distribution is examined, it is observed that 92% of training set belongs to a single output label. Regardless of the chosen classifier, this distribution tends to learn a bias through the majority label. To deal with this issue, oversampling method is used to create a more balanced dataset. The samples with minority labels are duplicated 20 times in both training and test dataset ending up with a training set with 9452 samples and test set with 8428 samples.

Approach:

When a classifier is applied on a dataset, most of the data attributes are either irrelevant to the evaluation or has a very little impact on it. Also, it is known that each attribute has a cost of getting it. The cost list for my dataset is available here. To be able to find the best attribute set for classification, a genetic algorithm based cost-sensitive learning model is implemented. For my genetic algorithm, I start my population with 10 hypotheses (each one corresponding to a classifier model with different set of attributes). For each member of population, a fitness value is found with respect to the following fitness function:

After creating the population, I iterated 200 generations with mutation and cross-over operations on the population. Each new created hypothesis is ranked with respect to the fitness values. Best fitness valued offsprings are replaced with the worst fitness valued hypotheses in the population. By applying this, the number of members in population is kept constant.

At the end of each generation, the average fitness value of 10 members are stored and shown in the graph below.

Average Fitness Value of Each Generation

It is noticeable that after generation 70, the increase in the fitness function slows down and almost saturates.

A more detailed technical report is available here.

Emre Doğan

Bilkent University

Department of Computer Engineering

December 18, 2018

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
data		data
doc		doc
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genetic-Algorithm-Based-Cost-Sensitive-Learning

Dataset:

Approach:

About

Releases

Packages

Languages

emredogan7/genetic-algorithm-based-cost-sensitive-learning

Folders and files

Latest commit

History

Repository files navigation

Genetic-Algorithm-Based-Cost-Sensitive-Learning

Dataset:

Approach:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages