top of page
Decision Trees
Conclusions
Decision Trees classify data which cannot be linearly separated
Gini impurity or entropy impurity can be used to measure how homogeneous the data classes in a node are, but Gini impurity is most common because it is less computationally intensive
A Decision Tree stops splitting when the impurity of a node is 0 or when the samples assigned to a node are small enough or when the maximum tree depth is reached
Overfitting can often happen when a decision tree is too large
Plot the misclassification cost against different maximum depth values to understand an acceptable max_depth value
Random Forest trains many Decision Trees and predicts the classification that receives the most votes to overcome the high variance of individual Decision Trees
bottom of page