Decision Trees are powerful algorithms, capable of fitting complex datasets. The decision trees make predictions based on the bunch of if/else statements by splitting a node into two or more sub-nodes.
With versatility, the decision tree is also prone to overfitting. One of the reasons why this algorithm often overfits is its depth. It tends to memorize all the patterns in the train data but struggles to perform well on the unseen data (validation or test set).
To overcome the overfitting problem, we can reduce the complexity of the algorithm by reducing the depth size.
A decision tree with a depth of 1 is called decision stump
and has only one split from the root.
Classes, functions, and methods:
DecisionTreeClassifier
: classification model fromsklearn.tree
class.max_depth
: hyperparameter to control the maximum depth of decision tree algorithm.export_text
: method fromsklearn.tree
class to display the text report showing the rules of a decision tree.
Note: we have already covered DictVectorizer
in session 3 and roc_auc_score
in session 4 respectively.
Add notes from the video (PRs are welcome)
The notes are written by the community. If you see an error here, please create a PR with a fix. |