Cal Poly Senior Project 2018-2019
Results of running oversampling, undersampling, and SMOTE sampling methods on various datasets using WEKA api.
Guide to Files
Each dataset has a folder. Within the folder, you will find the following files/directories:
-
<dataset name>Undersampling - contains 100 result files from the dataset being undersampled
-
<dataset name>Oversampling - contains 100 result files from the dataset being oversampled
-
<dataset name>SMOTE - contains 100 result files from the dataset being sampled with SMOTE
-
<dataset name>_A.arff - the unsampled (no sample) file (for training)
-
<dataset name>_B.arff - the unsampled (no sample) file (for testing)
-
<dataset name>.csv - the original dataset prior to sampling, column removal, and splitting into A and B.
-
<dataset name>_A.csv - the dataset prior to sampling and non-featured column removal, but post splitting into A and B (for training)
-
<dataset name>_B.csv - the dataset prior to sampling and non-featured column removal, but post splitting into A and B (for testing)
RQ3 Details
This repository contains the following datasets for RQ3 Undersampling and Oversampling:
- Ant
- Camel
- Ivy
- Jedit
- Keymind-A
- Keymind-B
- Log4j
- Lucene
- Poi
- Prop (only Undersampling 1%)
- Synapse
- Velocity
- Xalan
- Xerces