Access the project here
In this playground project, I explored three tree-based algorithms (Decision Tree, Random Forest, and Gradient Boosting Machine) on the Forest Cover Type dataset (from UCI Machine Learning Repository). The dataset takes forestry data from four wilderness areas in Roosevelt National forest in northern Colorado. The actual forest cover type for a given observation (30x30 meter cell) was determined from US Forest Service Region 2 Resource Information System data. The four wilderness areas represent forests with minimal human-caused disturbances, so that existing forest cover types are more a result of ecological processes rather than forest management practices. Kaggle also included this dataset in one of their ML competitions for the machine learning community to use for fun and practice. The objective is to predict the classification for the forest cover type based on given features.