Two datasets (project3_dataset1 and project3_dataset2) have been given along with demo datasets (project3_dataset3_test, project3_dataset3_train, project3_dataset4). Please check the Data_Format file first for a short description of the two datasets.
The following tasks have been accomplished:
- Implement three classification algorithms by yourself: Nearest Neighbor, DecisionTree, and Naïve Bayes.
- Implement Random Forests based on your own implementation of Decision Tree.
- Implement Boosting based on your own implementation of Decision Tree.
- Adopt 10-fold Cross Validation to evaluate the performance of all methods on the provided two datasets in terms of Accuracy, Precision, Recall, and F-1 measure.
The final submission includes the following:
- Code: Implementation of five methods. All the methods must be implemented by yourself. Existing packages or online codes for the algorithms are not allowed.
- Report: Describe the flow of all the implemented methods, and describe the choice you make (such as parameter setting, pre-processing, post-processing, how to deal with over- fitting, etc.). Compare their performance, and state their pros and cons based on your findings.
Please note:
- New datasets will be given to check your implemented classification methods and performance measures. The data format will be consistent with the Data_Format file that we already provided.
- During the demo, you will be asked to adopt specific setting and run your code.